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PREFACE 



This publication describes the CRAY X-MP Series Model 48 Con^uter 
System. It is written to assist programmers and engineers and assumes a 
familiarity with digital coniputers. 

The manual describes the overall computer system, its configurations, and 
equipment. It also describes the operation of the Central Processing 
Units that execute instructions, provide memory protection, report 
hardware exceptions, and provide interprocessor communications within the 
system. 

Details of the I/O Subsystem, the disk storage units, and the Solid-state 
Storage Device are given in the following publications; 

HR-0030 I/O Subsystem Hardware Reference Manual 

HR-0630 Mass Storage Subsystem Hardware Reference Manual 

HR-0031 Solid-state Storage Device (SSD®) Reference Manual 



/////////////////////////////////////////////////////// 

WARNING 

This equipment generates, uses, and can radiate radio 
frequency energy and if not installed and used in 
accordance with the instructions manual, may cause 
interference to radio communications. It has been 
tested and found to con^ly with the limits for a Class 
A confuting device pursuant to Subpart J of Part 15 of 
FCC Rules, which are designed to provide reasonable 
protection against such interference when operated in a 
commercial environment. Operation of this equipment in 
a residential area is likely to cause interference in 
which case the user at his own expense will be required 
to take whatever measures may be required to correct 
the interference. 

/////////////////////////////////////////////////////// 



HR-0097 iii 



CONTENTS 



PREFACE Ill 



1. SYSTEM DESCatlPTION 1-1 

INTRODUCTION 1-1 

CONVENTIONS 1-1 

Italics 1-1 

Register conventions 1-3 

Number conventions 1-4 

Clock period 1-4 

SYSTEM COMPONENTS 1-4 

Central Processing Units 1-5 

Interfaces 1-7 

I/O Subsystem 1-8 

Disk storage units 1-10 

Solid-state Storage Device 1-11 

Condensing units 1-13 

Power distribution units 1-14 

Motor-generator units 1-15 

SYSTEM CONFIGURATION 1-16 



2. CPU SHARED RESOURCES 2-1 

INTRODUCTION 2-1 

CENTRAL MEMORY 2-1 

Memory organization 2-2 

Memory addressing 2-3 

Memory access 2-3 

Conflict resolution 2-5 

Bank Busy conflict 2-6 

Simultaneous Bank conflict 2-6 

Section Access conflict 2-6 

Memory access priorities 2-6 

Memory error correction 2-6 

INTER-CPU COMMUNICATION SECTION 2-9 

Real-time clock 2-9 

Inter-CPU communication and control 2-10 

Shared Address and Shared Scalar registers 2-11 

Semaphore registers 2-11 

Shared register and semaphore conflicts 2-12 



HR-0097 



2. CPU SHARED RESOURCES (continued) 

CPU INPUT/OUTPUT SECTION 2-12 

Data transfer for Solid-state Storage Device 2-14 

Data transfer for I/O Subsystem 2-14 

6 Mbyte per second channels 2-14 

Multi-CPU programming 2-15 

6 Mbyte per second channel operation 2-16 

Input channel programming 2-16 

Input channel error conditions 2-17 

Output channel programming 2-18 

Programmed master clear to external device 2-19 

Access to Central Memory 2-19 

I/O lockout 2-20 

Memory bank conflicts 2-20 

I/O memory conflicts 2-20 

I/O memory request conditions 2-23 

I/O memory addressing 2-23 



3. CPU CONTROL SECTION 3-1 

INTRODUCTION 3-1 

INSTRUCTION ISSUE AND CONTROL 3-1 

Program Address register 3-2 

Next Instruction Parcel register 3-2 

Current Instruction Parcel register 3-2 

Lower Instruction Parcel register 3-3 

Instruction buffers 3-3 

EXCHANGE MECHANISM 3-5 

Exchange package 3-5 

Processor number 3-7 

Vectro not used (VNU) 3-7 

Enable Second Vector logical (ESVL) 3-7 

Enhanced addressing mode (EAM) 3-8 

Memory error data 3-8 

Exchange registers 3-9 

Exchange Address register 3-9 

Mode register 3-9 

Flag register 3-11 

Cluster Number register 3-12 

Program State register 3-12 

A registers 3-12 

S registers 3-12 

Program Address register 3-13 

Memory field registers 3-13 

Active exchange package 3-13 

Exchange sequence 3-13 

Exchange initiated by deadstart sequence 3-14 



HR-0097 vi 



EXCHANGE MECHANISM (continued) 

Exchange initiated by interrupt flag set 3-14 

Exchange initiated by program exit 3-14 

Exchange sequence issue conditions .... 3-15 

Exchange package management 3-15 

MEMORY FIELD PROTECTION 3-16 

Instruction Base Address register .... 3-17 

Instruction Limit Address register 3-17 

Data Base Address register 3-18 

Data Limit Address register 3-18 

Program range error 3-18 

Operand range error 3-19 

PROGRAMMABLE CLOCK 3-19 

Instructions 3-19 

Interrupt Interval register 3-19 

Interrupt Countdown counter 3-20 

Clear programmable clock interrupt request 3-20 

PERFORMANCE MONITOR 3-20 

DEADSTART SEQUENCE 3-21 



4. CPU COMPUTATION SECTION 4-1 

INTRODUCTION 4-1 

OPERATING REGISTERS 4-3 

ADDRESS REGISTERS 4-3 

A registers 4-3 

B registers 4-5 

SCALAR REGISTERS 4-6 

S registers 4-6 

T registers 4-8 

VECTOR REGISTERS 4-9 

V registers 4-9 

V register reservations and chaining 4-12 

Vector control registers 4-13 

Vector Length register 4-13 

Vector Mask register 4-13 

FUNCTIONAL UNITS 4-13 

Address functional units ... 4-14 

Address Add functional unit 4-14 

Address Multiply functional unit 4-14 

Scalar functional units 4-15 

Scalar Add functional unit 4-15 

Scalar Shift functional unit 4-15 

Scalar Logical functional unit 4-16 

Scalar Population/Parity/Leading Zero 

functional unit 4-16 

Vector functional units 4-16 

Vector functional unit reservation 4-16 



HR-0097 vii 



4. CPU COMPUTATION SECTION (continued) 

Vector Add functional unit 4-17 

Vector Shift functional unit 4-17 

Full Vector Logical functional unit 4-17 

Second Vector Logical functional unit 4-18 

Vector Population/Parity functional unit 4-18 

Floating-point functional units 4-19 

Floating-point Add functional unit 4-19 

Floating-point Multiply functional unit 4-19 

Reciprocal i^proximation functional unit 4-20 

ARITHMETIC OPERATIONS 4-20 

Integer arithmetic 4-20 

Floating-point arithmetic 4-21 

Normalized floating-point numbers 4-22 

Floating-point range errors 4-23 

Floating-point Add functional unit 4-24 

Floating-point Multiply functional unit 4-24 

Floating-point Reciprocal Approximation 

functional unit 4-26 

Double-precision numbers • • 4-26 

Addition algorithm 4-27 

Multiplication algorithm 4-27 

Division algorithm 4-28 

Newton's method 4-30 

Derivation of the division algorithm 4-30 

LOGICAL OPERATIONS . 4-35 

5. CPD INSTRUCTIONS 5-1 

INSTRUCTION FORMAT 5-1 

1-parcel instruction format with discrete j and k fields . 5-1 

1-parcel instruction format with combined j and k fields . 5-1 
2-parcel instruction format with combined j» kt 

and TO fields • 5-2 

2-parcel instruction format with combined i, j, fe# 

and m fields 5-3 

SPECIAL REGISTER VALUES 5-4 

INSTRUCTION. ISSUE 5-5 

INSTRUCTION DESCRIPTIONS 5-6 



APPENDIX SECTION 

A. INSTRUCTION SUMMARY A-1 



HR-0097 viii 



B. 6 MBYTE PER SECOND CHANNEL DESCRIPTIONS B-1 

INTRODUCTION B-1 

6 MBYTE PER SECOND INPUT CHANNEL SIGNAL SEQUENCE B-1 

Data bits 2° through 2^^ B-1 

Parity bits through 3 B-2 

Ready signal B-3 

Resume signal B-3 

Disconnect signal B-3 

6 MBYTE PER SECOND OUTPUT CHANNEL SIGNAL SEQUENCE B-3 

Data bits 2° through 2^^ B-4 

Parity bits through 3 B-5 

Ready signal B-5 

Resume signal B-5 

Disconnect signal B-5 

C. PERFORMANCE MONITOR C-1 

INTRODUCTION C-1 

SELECTING PERFORMANCE EVENTS C-1 

READING PERFORMANCE RESULTS C-3 

TESTING PERFORMANCE COUNTERS C-3 

D. SECDED MAINTENANCE FUNCTIONS D-1 

INTRODUCTION D-1 

VERIFICATION OF CHECK BIT STORAGE D-1 

VERIFICATION OF CHECK BIT GENERATION D-2 

VERIFICATION OF ERROR DETECTION AND CORRECTION D-2 

CLEARING MAINTENANCE MODE FUNCTIONS D-3 



FIGURES 

1-1 CRAY X-tMP Model 48 mainframe with a Cray I/O Subsystem 

and an SSD 1~2 

1-2 Basic organization of the 

4-processor system 1-5 

1-3 Control and data paths for a single CPU 1-6 

1-4 Typical interface cabinet 1-8 

1-5 I/O Subsystem chassis 1-9 

1-6 DD-29 Disk Storage Unit 1-11 

1-7 Solid-state Storage Device chassis 1-12 

1-8 Condensing unit 1-13 

1-9 Power distribution units 1-14 

1-10 Motor-generator equipment 1-15 

1-11 Block diagram of the four-processor system with 

full disk capacity 1-16 

1-12 Block diagram of the four-processor system with 

block multiplexer channels 1-17 

2-1 Central Memory organization for a 

4-processor system 2-2 



HR-0097 ix 



FIGURES (continued) 

2-2 Memory address (64 banks) 2-3 

2-3 Memory data path with SECDED 2-7 

2-4 Error correction matrix 2-8 

2-5 Shared registers 2-10 

2-6 Basic I/O program flowchart 2-18 

2-7 Channel I/O control (shown for CPU 0) 2-21 

2-8 Input/output data paths (for CPU 0) 2-22 

3-1 Instruction issue and control elements 3-1 

3-2 Instruction buffers 3-3 

3-3 Exchange package for a 4-processor system 3-6 

4-1 Address registers and functional units 4-4 

4-2 Scalar registers and functional units 4-7 

4-3 Vector registers and functional units 4-10 

4-4 Integer data formats 4-21 

4-5 Floating-point data format 4-22 

4-6 Exponent matrix for Floating-point Multiply unit 4-24 

4-7 Integer multiply in Floating-point 

Multiply functional unit 4-26 

4-8 49-bit floating-point addition 4-27 

4-9 Floating-point multiply partial-product sums pyramid 4-29 

4-10 Newton's method 4-30 

5-1 General form for instructions 5-1 

5-2 1-parcel instruction format with discrete 3 and k fields . . . 5-2 

5-3 1-parcel instruction format with combined j and k fields . . . 5-2 

5-4 2-parcel instruction format with combined j, k, and m fields . 5-3 

5-5 2-parcel instruction format for a branch with combined i, 3, 

k, and m fields 5-4 

5-6 2-parcel instruction format for a 24-bit immediate constant 

with combined i, 3, k, and m fields 5-4 

5-7 Vector left double shift, first element, VL greater than 1 . . 5-72 

5-8 Vector left double shift, second element, VL greater than 2 . . 5-72 

5-9 Vector left double shift, last element 5-72 

5-10 Vector right double shift, first element 5-73 

5-11 Vector right double shift, second element, 

VL greater than 1 5-74 

5-12 Vector right double shift, last operation 5-74 



TABLES 

1-1 CRAY X-MP four-processor system characteristics 1-3 

2-1 Channel word assembly/disassembly 2-17 

3-1 Exchange Package assignments 3-7 

B-1 Input channel signal exchange B-2 

B-2 Output channel signal exchange B-4 

C-1 Performance counter group descriptions C-2 



INDEX 



HR-0097 



SYSTEM DESCRIPTION 



INTRODUCTION 

The CRAY X-MP model 48 Computer System is a powerful, general purpose 
machine that contains four central processing units (CPUs) . Like all 
CRAY X-MP multiprocessor systems, it is able to achieve extremely high 
multiprocessing rates by efficiently using the scalar and vector 
capabilities of all CPUs combined with the system's random-access 
solid-state memory (RAM) and shared registers. 

Vector processing is the performance of iterative operations on sets of 
ordered data. When two or more vector operations are chained together, 
two or more operations can be executing each 9 . 5-nanosecond clock period, 
greatly exceeding the computational rates of conventional scalar 
processing. Scalar operations complement the vector capability by 
providing solutions to problems not readily adaptable to vector 
techniques. 

The machine has very high performance levels, and equipment options allow 
systems to be configured for a particular use. Central Memory of the 
4-processor mainframe is 8 million 64-bit words (see table 1-1). The 
system is compatible with all existing models of the Cray I/O Subsystem 
and its associated mass storage subsystem. In addition, an optional 
high-performance Cray Solid-state Storage Device (SSD) can be attached to 
the mainframe. Figure 1-1 illustrates the mainframe with a Cray I/O 
Subsystem and an SSD. 

This section describes system components and configurations. Table 1-1 
gives overall system characteristics. 



CONVENTIONS 

The following conventions are used in this manual. 

ITALICS 

Italicized lowercase letters, such as jk, indicate variable information. 
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Figure 1-1. CRAY X-MP model 48 mainframe with 
a Cray I/O Subsystem and an SSD 
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Table 1-1. CRAY X-MP 4-processor system characteristics 



Configuration - Mainframe with 4 Central Processing Units (CPUs) 

- I/O Subsystem with 2, 3, or 4 I/O Processors 

- Optional Solid-state Storage Device (SSD) 



CPU speed - 9.5 ns CPU clock period 

- 105 million floating-point additions per second per CPU 

- 105 million floating-point multiplications per second 

per CPU 

- 105 million half-precision floating-point divisions per 

second per CPU 

- 33 million full-precision floating-point divisions per 

second per CPU 

- Simultaneous floating-point addition, multiplication, 
and reciprocal approximation within each CPU 



Memory 



Mainframe has 8 million (model 48) 64-bit words in 
Central Memory 



Input/Output - 



Two 1250 Mbyte per second channel pairs for interface 

to Solid-state Storage Device (SSD) 

Four 100 Mbyte per second channel pairs for interface 

to I/O Subsystem 

Four 6 Mbyte per second channel pairs 



Physical 



64 sq ft floor space for mainframe 

15 sq ft floor space for I/O Subsystem 

15 sq ft floor space for SSD 

5. 65 tons, mainframe weight 

1.5 tons, I/O Subsystem weight 

1.5 tons, SSD weight 

Liquid refrigeration of each chassis 

400 Hz power from motor-generators 



REGISTER CONVENTIONS 

Parenthesized register names are used frequently in this manual as a form 

of shorthand notation for the expression "the contents of register . " 

For example, "Branch to (P)" means "Branch to the address indicated by the 
contents of register P." 
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Designations for the A, B, S, T, and V registers are used extensively. 
For example, "Transmit (Tjfc) to Si" means "Transmit the contents of 
the T register specified by the jk designators to the S register 
specified by the i designator." 

Register bits are numbered right to left as powers of 2, starting with 
2". Bit 2°-' of an S, V, or T register value represents the most 
significant bit. Bit 2^^ of an A or B register value represents the 
most significant bit. (A and B registers are 24 bits.) The numbering 
conventions for the Exchange Package and the Vector Mask register are 
exceptions. Bits in the Exchange Package are nvimbered from left to right 
and are not numbered as powers of 2 but as bits through 63 with as the 
most significant and 63 as the least significant. The Vector Mask 
register has 64 bits, each corresponding to a word element in a vector 
register. Bit 2^^ corresponds to element 0, bit 2^ corresponds to 
element 63. 



NUMBER CONVENTIONS 

Unless otherwise indicated, numbers in this manual are decimal numbers. 
Octal numbers are indicated with an 8 subscript. Exceptions are register 
numbers, channel numbers, instruction parcels in instruction buffers, and 
instruction forms which are given in octal without the subscript. 



CLOCK PERIOD 

The basic unit of CPU computation time is 9.5 nanoseconds (ns) and is 
referred to as a clock period (CP) . Instruction issue, memory references, 
and other timing considerations are often measured in CPs. 



SYSTEM COMPONENTS 

The 4-processor system is composed of a mainframe and an I/O Subsystem. 
Mass storage devices, front-end interfaces, and optional tape devices are 
also integral parts of a system. Optionally, a Cray Solid-state Storage 
Device (SSD) can be part of the system. Supporting this equipment are 
condensing units for refrigeration, motor-generators to provide system 
power, and power distribution units for the mainframe, I/O Subsystem, and 
SSD. System components are described on the following pages. 
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CENTRAL PROCESSING UNITS 

Each CPU has independent control and computation sections* All CPUs share 
Central Memory and the inter-CPU communication and I/O sections. (CPU 
sections are described in later sections.) Figure 1-2 shows the mainframe 
chassis. Figure 1-2 illustrates the basic organization of the computer; 
figure 1-3 illustrates the components and control and data paths of each 
CPU in the system. 



CONTROL SECTION 

• Instruction 
buffers 

• Control 
registers 

• Exchange 
mechanism 

• Interrupt 

• Programmable 
clock 

• Status 
register 



CPU COMMUNICATION 
SECTION 



• Shared registers 



Semaphore 
registers 



Real-time Clock 
register 



COMPUTATION 
SECTION 



Registers 



Functional 
units 



CONTBOL SECTION 

• Instruction 
buffers 

• Control 

registers 

• Exchange 
mechanism 

• Interrupt 

• Programmable 
clock 

• Status 
register 




COMPUTATION 
SECTION 

• Register 

• Functional 
units 



MEMORY SECTION 



8 million 
64-bit words 



COMPUTATION 
SECTION 



Registers 



• Functional 
units 



CONTROL SECTION 

• Instruction 
buffers 

• Control 
registers 

• Exchange 
mechanism 

• Interrupt 

• Programmable 
clock 

• Status 
register 



COMPUTATION 

SECTION 

• Register 

• Functional 
units 



CONTROL SECTION 



• Instruction 
buffers 



• Control 
registers 



• Exchange 
mechanism 



Interrupt 



• Programmable 
clock 



• Status 
register 



I/O SECTION 

■ Four 6 Mbyte per second channel pairs 

• Two 1250 Mbyte per second channel pairs 

• Four 100 Mbyte per second channel pairs 



Figure 1-2. Basic organization of the 4-processor system 
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Figure 1-3. Control and data paths for a single CPU 
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INTERFACES 

The Cray system is designed for use with front-end computers in a 
computer network. A front-end computer system is self contained and 
executes under the control of its own operating system. 

Standard interfaces connect the Cray mainframe's I/O channels to channels 
of front-end computers, providing input data to the Cray system and 
receiving output from it for distribution to peripheral equipment. 
Interfaces condensate for differences in channel widths, machine word 
size, electrical logic levels, and control signals. (The Master I/O 
Processor of the I/O Subsystem communicates with the mainframe through a 
6 Mbyte per second channel pair to a channel adapter module in the Cray 
mainframe.) Communication continues through a front-end interface, to 
the front-end computer typically through a front-end computer I/O channel. 

The front-end interface is housed in a stand-alone cabinet (figure 1-4) 
located near the host computer. Its operation is invisible to the 
front-end computer user and the Cray user. 

A primary goal of the interface is to maximize the use of the front-end 
channel connected to the Cray system. Since the MIOP channel connected 
to the interface is faster than any front-end channel connected to the 
interface, the burst rate of the interface is limited by the maximum rate 
of the front-end channel. 

Interfaces to front-end computers allow the front-end computers to 
service the Cray Computer System in the following ways: 

• As a master operator station 

• As a local operator station 

• As a local batch entry station 

• As a data concentrator for multiplexing several other stations 
into a single Cray channel 

• As a remote batch entry station 

• As an interactive ccamnunication station 

Peripheral equipment attached to the front-end computer varies depending 
on the use of the Cray system. 



HR-0097 1-7 




Figure 1-4. Typical interface cabinet 



I/O SUBSYSTEM 

The I/O Subsystem, shown in figure 1-5, is standard on the CRAY X-MP 
system and has two, three, or four I/O Processors (lOPs) , Buffer Memory, 
and required interfaces. The I/O Subsystem is designed to provide fast 
data transfer between its Buffer Memory and the mainframe's Central 
Memory as well as front-end computers, peripheral devices, and storage 
devices. 

Four types of I/O Processors may be configured in an I/O Subsystem: a 
Master lOP (MIOP) , a Buffer lOP (BIOP) , a Disk lOP (DIOP) and an 
Auxiliary lOP (XIOP) . All I/O Subsystems must have at least one MIOP and 
one BIOP. The number of DIOPs and XIOPs is site dependent. 

Each lOP of the I/O Subsystem has a memory section, a control section, a 
computation section, and an input/output section. Input/output sections 
are independent and handle some portion of the I/O requirements for the 
Subsystem. Each lOP also has six direct memory access ports to its Local 
Memory. 
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The Master I/O Processor (MIOP) controls the front-end interfaces and the 
standard group of station^ peripherals. The Peripheral Expander 
interfaces the station peripherals to one direct memory access (DMA) port 
of the MIOP. The MIOP also connects to Buffer Memory and to the 




Figure 1-5. I/O Subsystem chassis 



t The term station means both hardware and software. Station is the 
link to the front end or can act as a limited front end (as the MIOP) 
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mainframe over a 6 Mbyte per second channel pair. The MIOP communicates 
with the Cray Operating System (COS) to coordinate the activities of the 
entire I/O Subsystem. 

The Buffer I/O Processor (BIOP) is the main link between the mainframe's 
Central Memory and the mass storage devices. Data from mass storage is 
transferred through the BIOP's Local Memory to the mainframe's Central 
Memory through a 100 Mbyte per second channel pair. 

The Disk I/O Processor (DIOP) is used for additional disk storage units. 
This processor can handle up to four disk controller units with up to 16 
disk storage units. The DIOP uses one DMA port for each controller, one 
DMA port to connect to Buffer Memory, and another DMA port to connect a 
100 Mbyte per second channel pair to the mainframe Central Memory. 

The Auxiliary I/O Processor (XIOP) is used for block multiplexer channels 
and interfaces to a maximum of four BMC-4 Block Multiplexer Controllers. 
Each controller can handle up to four block multiplexer channels. The 
XIOP uses one DMA port for each controller and another DMA port to 
connect with Buffer Memory. 

I/O Subsystem hardware allows for simultaneous data transfers between the 
BIOP and DIOP or XIOP of the I/O Subsystem and the mainframe's Central 
Memory. ' 

The CPU input/output section for the system is described in section 2 of 
this manual. Refer to the I/O Subsystem Reference Manual, CRI 
publication HR-0030, for a complete description of the I/O Subsystem. 



DISK STORAGE UNITS 

For mass storage, the system uses Cray Research, Inc., disk storage units 
(DSUs) . A disk controller unit (DCU) interfaces the disk storage units 
with an I/O Processor of an I/O Subsystem through one direct memory 
access (DMA) port. Up to four disk storage units can be connected to a 
single DCU. 

The I/O Processor and the disk controller unit can transfer data between 
the DMA port and four DSUs with all DSUs operating at full speed without 
missing data or skipping revolutions. A minimum of 2 and a maximum of 48 
DSUs can be configured on an I/O Subsystem. Figure 1-6 shows a Cray 
DD-49 Disk Storage Unit. The disk controller unit (DCU) is housed in the 
I/O Subsystem chassis. 



Software to support the 100 Mbyte per second channel pair to the 
XIOP is currently not available. 
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Each DSU has two accesses for connecting it to controllers. The second 
independent data path to each DSU exists through another Cray Research, 
Inc., controller. Reservation logic provides controlled access to each 
DSU. Dynamic sharing of devices is not supported by the Cray Operating 
System (COS) software. Further information about the mass storage 
subsystem is included in the I/O Subsystem Reference Manual, CRI 
publication HR-0030, and the Mass Storage Subsystem Hardware Reference 
Manual, CRI publication HR-0630. 




Figure 1-6. DD-49 Disk Storage Unit 



SOLID-STATE STORAGE DEVICE 

The Solid-state Storage Device (SSD) shown in figure 1-7 is used for 
temporary data storage and transfers data to and from the mainframe's 
Central Memory. The transfer speed is dependent on the SSD memory size 
and configuration as described in the Solid-state Storage Device (SSD) 
Reference Manual, CRI publication HR-0031. The maximum speed attained 
from the SSD to Central Memory is 1250 Mbytes per second for each 1250 
Mbyte channel. 
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Figure 1-7. Solid-state Storage Device chassis 
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CONDENSING UNITS 

Condensing units (figure 1-8) contain the major components of the 
refrigeration system used to cool the computer chassis and consist of two 
25-ton condensers. Heat is removed from the condensing unit by a second 
level cooling system that is not part of the computer system. Freon, 
which cools the computer, picks up heat and transfers it to water in the 
condensing unit. 





Figure 1-8. Condensing unit 
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POWER DISTRIBUTION UNITS 

The mainframe, I/O Subsystem, and BSD all operate from 400 Hz 3-phase 
power. The mainframe, I/O Subsystem, and SSD have independent power 
distribution units. 

The power distribution unit for the mainframe contains adjustable 
transformers for regulating the voltage to each column of the mainframe. 
The power distribution unit also contains temperature and voltage 
monitoring equipment that checks temperatures at strategic locations on 
the mainframe chassis. Automatic warning and shutdown circuitry protects 
the mainframe in case of overheating or excessive cooling. Control 
switches for the motor-generators and the condensing unit are mounted on 
the mainframe power distribution unit. 

A smaller power distribution unit performs similar functions for the I/O 
Subsystem chassis or the SSD chassis. 

Figure 1-9 shows the power distribution units for the mainframe (left) 
and for the I/O Subsystem or SSD (right) . 




.» r> !,■« -r .-«■'■ 





Figure 1-9. Power distribution units 
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MOTOR-GENERATOR UNITS 

The motor-generator units convert primary power from the commercial power 
mains to the 400 Hz power used by the system. These units isolate the 
system from transients and fluctuations on the commercial power mains. 
The equipment consists of two or three motor-generator units and a 
control cabinet. Figure 1-10 shows a typical motor-generator and its 
control cabinet. 





Figure 1-10. Motor-generator equipment 
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SYSTEM CONFIGURATION 

Figures 1-11 and 1-12 illustrate two configurations for the CRAY X-MP 
model 48. 
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Figure 1-11. 



Block diagram of the 4-processor system 
with full disk capacity 
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Figure 1-12. 



Block diagram of the 4-processor system 
with block multiplexer channels 
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CPU SHARED RESOURCES 



INTRODUCTION 

All four central processing units (CPUs) share the mainframe's Central 
Memory, the inter-CPU communication section, and the input/output 
section. These areas common to all CPUs are described in the following 
pages. 



CENTRAL MEMORY 

Central Memory consists of a number of banks of solid-state, 
random-access memory (RAH) and is shared by the CPUs and the I/O 
section. Standard Central Memory size for a 4-processor system is 8 
million words with 64 banks. Banks are independent of each other. Each 
word is 72 bits with 64 data bits and 8 check bits. Sequentially 
addressed words reside in sequential banjos. 

Central Memory cycle time is 4 clock peri&s (CPs) or 38 nanoseconds 
(ns) . Access time, the time required to 4%^^ ^'^ operand from Central 
Memory to an operating register, is 14 CPsf 1152 ns) for A (address) and S 
(scalar) registers. Access time is 17 CP^ # vector length for a V 
(vector) register and 16 CPs + block lendtn for a block transfer to a B 
(intermediate address) or T (intermediatj'/scalar) register. 

The maximum transfer rate per CPU for B, T, and V registers is three 
words per CP; for A and S registers per CPU, it is one word every 2 CPs. 
Transfer of instructions to instruction buffers occurs at a rate of 32 
parcels (8 words) per CP. For the I/O section, the transfer rate is 4 
words per CP. 

Central Memory features are summarized below and are described in detail 
in the following paragraphs. 

• Shared access frc»n all CPUs 

• 8 million words of integrated circuit memory 

• 64 data bits and 8 error correction bits per word 

• 64 interleaved banks 

• 4-CP bank cycle time 
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• single error correction/double error detection (SECDED) 

• 3 words per CP transfer rate to B, T, and V registers per CPU 

• 1 word per 2 CP transfer rate to A and S registers per CPU 

• 8 words per CP transfer rate to instruction buffers 

• 4 words per CP transfer rate to I/O concurrent with all memory 
activity except instruction fetch and exchange 



MEMORY ORGANIZATION 

Memory is organized to provide fast, efficient access for all CPUs. , Data 
transfers to and from memory are corrected with single error correction, 
double error detection. Central Memory is organized into four sections 
with 16 banks in each section. 

Each CPU is connected to an independent access path into each of the four 
sections, as shown in figure 2-1. This configuration allows up to 16 
memory references per clock period* 
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Figure 2-1. Central Memory organization 
for a 4-processor system 
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MEMORY ADDRESSING 

A word in a 64-bank memory is addressed in a maximum of 23 bits as shown 
in figure 2-2. The low-order 6 bits specify one of the 64 banks. The 
next 14-bit field specifies an address within the chip. The high-order 3 
bits specify one chip on the module. 



>22 



>19 



Chip 
address 
select 


Internal bit 

address in 

chip 


6-bit 
bank 



Figure 2-2. Memory address (64 banks) 



MEMORY ACCESS 

Each CPU in the system has four memory access ports, referred to as Port 

A, Port B, Port C, and I/O. Each port is capable of making one reference 
per CP. Forts A, B, and C are used for CPU register transfers. 

B, T, and vector memory instructions issue to a particular memory port: 

• Vector read (block reads only) , B read instructions (176, 034) use 
Port A. 

• Vector read (block reads only) , T read instructions (176, 036) use 
Port B. 

• Vector store, B, or T store instructions (177, 035, and 037) and 
scalar instructions (100-137) use Port C. 

Once an instruction issues to a port, that port is reserved until all 
references are made for that instruction. 

The references for each element of a block transfer (V,B,T) are made and 
completed in sequence through a port. However, since each reference is 
examined individually for possible conflicts, the data flow for a 
transfer may not be continuous. If an instruction requires a port that 
is busy, issue is blocked. Total execution time of the transfer depends 
on the number and type of conflicts encountered during the transfer. 
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******************************************************* 

CAUTION 

Because concurrent blcxik reads and writes are not 
examined for read before write or write before read 
(memory overlap hazard conditions) , the software must 
detect where this condition occurs and ensure 
sequential operation. 

******************************************************* 



The bidirectional memory mode enable (002500) , bidirectional memory mode 
disable (002600), and the cc^plete memory reference (002700) 
instructions are provided to resolve these cases and assure sequential 
operation. If the bidirectional memory mode is clear, block reads and 
writes are not allowed to operate concurrently within that CPU. 
Instruction 002700 allows the program to wait until the last references 
of all preceding block transfers are past the conflict resolution stage 
within the CPU issuing it and the transferred data is being transmitted 
to the designated memory or register locations. Instruction 002700 
provides software a mechanism, wherever necessary in the program, to 
guarantee sequential memory operation within a CPU or between CPUs. 

Issue of scalar memory references requires Ports A, B, and C to be 
available, ensuring sequential operation between block transfers and 
scalar references within a CPU. 

A scalar reference conflict is detected in CP 4 of execution. If a 
conflict occurs, two more scalar references are allowed to issue. A 
fourth scalar reference holds issue if the conflict condition still 
exists for the first scalar reference. 

Scalar references always execute in the order they are issued within a 
CPU. Instruction 002700 detects when all scalar references are past the 
conflict resolution stage within the CPU issuing it. 

An I/O channel references memory through a specific CPU's I/O port (see 
subsection on CPU Input/Output Section) . The I/O port can be active 
regardless of the activities on Ports A, B, or C. 

For instruction fetches and exchange sequences, the CPUs are allowed 
access to memory in pairs; CPUs and 1 comprise one pair, CPUs 2 and 3 
another pair. Only one instruction fetch or exchange sequence can occur 
among the four CPUs at a time. 

When a CPU requests an instruction fetch, referencing from all memory 
ports associated with that CPU pair is inhibited and the 32 banks being 
referenced are reserved (to prevent referencing from the other CPU 
pair). When memory is quiet (0 to 3 CPs) , the fetch proceeds and 
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references 32 banks in the next 4 CPs. Referencing of the eight ports 
is not enabled until 3 CPs later, to ensure all 32 banks are quiet. 



NOTE 

A fetch sequence that follows a scalar store can, under 
certain conditions, complete before the store. For this 
to happen, however, an out-of -buffer condition must arise 
before the scalar store is in CP 2 of execution. The 
out-of-buffer condition can occur before the scalar store 
is in CP 2 of execution if a buffer boundary is crossed 
without doing a branch. This presents a problem only if 
the fetch and store are to the same area in memory. 
Therefore, software that utilizes dynamic coding should 
ensure that the code generated is actually in memory 
before that area of memory is fetched into the 
instruction buffers. 



During this time, the other CPU pair has access to the remaining banks of 
memory. 

When a CPU requests an exchange, all referencing from the four memory 
ports of the other CPU in the CPU pair is inhibited and 32 banks are 
reserved (to prevent referencing from the other CPU pair) . When memory is 
quiet (0 to 3 CPs) , the exchange proceeds and references 16 banks in the 
next 20 CPs. Each bank is referenced twice during this time, once for a 
read and once for a write. An exchange sequence requires all activities 
within a CPU to con^lete before the exchange request is made. As with the 
instruction fetch, the other CPU pair has access to the remaining banks of 
memory. 

A fetch request follows immediately after the exchange is complete and 
then referencing from the memory ports of the other CPU in the pair is 
enabled. 



Conflict resolution 

During each clock period, references to the memory ports in the system are 
examined for memory access conflicts* If a conflict occurs for a 
reference, the reference is held and no further referencing from that port 
is allowed until the conflict is resolved. 

Three types of memory access conflicts can occur: Bank Busy, Simultaneous 
Bank, and Section Access. 
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Bank Busy conflict - The Bank Busy conflict is caused by any port within 
or between CPUs requesting a bank currently in a reference cycle. 
Resolution of this conflict occurs when the bank cycle is complete. All 
ports in the CPU are held 1, 2, or 3 CPs because of a Bank Busy conflict. 

Simultaneous Bank conflict - The Simultaneous Bank conflict is caused by 
two or more ports in different CPUs requesting the same bank. Resolution 
of this conflict is based on a priority (see subsection below on Memory 
access priorities) . All ports in a CPU are held 1 CP because of a 
Simultaneous Bank conflict. A Bank Busy conflict always follows a 
Simultaneous Bank conflict. 

Section Access conflict - The Section Access conflict is caused by two or 
more ports in the same CPU requesting any bank in the same section. 
Resolution of this conflict is based on priority. The highest priority 
port is allowed to proceed, all other ports involved in this conflict 
hold (see subsection below on Memory access priorities) . The port is 
held 1 CP because of a section access conflict. 



Memory access priorities 

The following priorities are used to resolve memory access conflicts. 

• Intra-CPU priority: the priority between Ports A, B, and C is 
determined by the following conditions: 

- Any port with an odd increment always has a higher priority 
than a port with an even Increment, regardless of their issued 
sequence . 

- Among all ports with the same type of increment (odd or even) , 
the relative time of issue determines the priority, with the 
first issued having the highest priority. 

• Inter-CPU priority: every 4 CPs the priority between CPUs 
changes. 

• I/O priority: the I/O ports are always lowest priority, within 
CPUs. 



MEMORY ERROR CORRECTION 

A single error correction/double error detection (SECDED) network is 
used between a CPU and memory. SECDED assures that data written into 
memory can be returned to the CPU with consistent precision (figure 2-3) . 
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If a single bit of a data word is altered, the single error alteration 
is automatically corrected before passing the data word to the 
ccHnputer. If 2 bits of the same data word are altered, the error is 
detected but not corrected. In either case, the CPU can be interrupted, 
depending on interrupt options selected to allow processing of the 
error. For 3 or more bits in error, results are ambiguous. 
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Figure 2-3. Memory data path with SECDED 



The SECDED error processing scheme is based on error detection and 
correction codes devised by R. W. Hamming.''' An 8-bit check byte is 
appended to the 64-bit data word before the data is written in memory. 
The 8 check bits are generated as even parity bits for a specific group 
of data bits. Figure 2-4 shows the bits of the data word used to 
determine the state of each check bit. An X in the horizontal row 
indicates that data bit contributes to the generation of that check bit. 
Thus, check bit is the bit that makes group parity even for the group 
of bits 2l, 23, 25, 27, 2^ 2^, 2", 2^^, 2", 2^^ , 221, 223, 225, 



227, 229, and 231 through 255. 



The 8 check bits and the data word are stored in memory at the same 
location. When read from memory, the same 64-bit matrix of figure 2-4 is 
used to generate a new set of check bits, which are compared with the old 
check bits. The resulting 8 comparison bits are called syndrome'"'" bits 
(S bits) . The states of these S bits are all symptoms of any error that 
occurred (l=No compare). If all syndrome bits are 0, no memory error is 
assumed. 



t Hamming, R.W. , "Error Detection and Correcting Codes," Bell System 
Technical Journal, 29, No. 2, pp. 147-160 (April, 1950). 

ft Syndrome: Any set of characteristics regarded as identifying a 
certain type, condition, etc. Webster's New World Dictionary. 
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CHECK BYTE 



2?! 2'0 2^' 2^8 2^'' 2^6 S^S 26'» 2^^ 2^^ 26I 2^0 jSS 2^8 2^' 2^^ 2*^ 25'* 2^^ 2^^ 2^' 2^° 2'*' 2'**' 

cneck bito x xxxxxxxx 

check bit i x 

check bit 2 x 

check bit 3 x 

check bit 4 x 

check bit 5 x 

check bit 6 x 

check bit 7 x 



xxxxxxxx 

xxxxxxxx xxxxxxxx 

"xxxxxxx xxxxxxxx 

xxxx xxxx 

XX XX XX XX 

xxxx XXXX 



2^7 21,6 2'*5 2'*'' 2''^ 2'*^ 2''^ 2'*'' 2^9 2^8 jS? jSG jSB 23'. 233 232 231 jSO 2^5 2^8 2^' 2^6 2^5 22" 

XXXXXXXX XXXXXXXX X X X X 

XXXXXXXX XXXXXXXX XX XX 

xxxxxxxx xxxx 

xxxxxxxx X XXX 

xxxx XXXX 

XX XX XX XX XXXXXXXX 

xxxx xxxx XXXXXXXX 

X X XX X X XX XXXXXXXX 



223 222 22' 22» 2" 2l8 2" 2^^ 2'= 2"* 2^3 2l2 2^ 21'' 2^ 2^ 2' 2^ 2* 2- 2^ 2^ 2' 2" 

xxxx xxxx xxxx 

XX XX XX XX XX XX 

xxxx xxxx xxxx 

X xxxx XXX X XXX 

xxxxxxxx xxxxxxxx XXXXXXXX 

XXXXXXXX xxxxxxxx 

xxxxxxxx xxxxxxxx 
xxxxxxxx xxxxxxxx 



Figure 2-4. Error correction matrix 

Any change of state of a single bit in memory causes an odd number of 
syndrome bits to be set to 1. A double error (an error in 2 bits) appears 
as an even number of syndrome bits set to 1. 

The matrix is designed so that: 

• If all syndrome bits are 0, no error is assumed. 

• If only 1 syndrome bit is 1, the associated check bit is in error. 

• If more than 1 syndrome bit is 1 and the parity of syndrome bits 
SO through S7 is even, then a double error (or an even number of 
bit errors) occurred within the data bits or check bits. 



HR-0097 2-8 



• If more than 1 syndrome bit is 1 and the parity of all syndrome 
bits is odd, then a single and correctable error is assumed to 
have occurred. The syndrome bits can be decoded to identify the 
bit in error. 

• If 3 or more memory bits are in error, the parity of all syndrome 
bits is odd and results are ambiguous. 

Modules involved with generating and interpreting the 8-bit check byte 
used for SECDED include logic that can be used for verifying check bit 
storage, check bit generation, and error detection and correction. Refer 
to Appendix D for information on SECDED maintenance functions. 



INTER-CPU COMMUNICATION SECTION 

The inter-CPU communication section of the system contains special 
hardware for communication among the CPUs, for control, and for a 
real-time clock. The Real-time Clock (RTC) , Shared Address (SB) , Shared 
Scalar (ST) , and Semaphore (SM) registers are shared by the CPUs. These 
registers with their sources and destinations are shown in figure 2-5 and 
described in the following paragraphs. 



REAL-TIME CLOCK 

The mainframe contains one Real-time Clock (RTC) register shared by the 
CPUs. Programs can be timed precisely by using the clock period (CP) 
counter. This counter is 64 bits wide and advances one count each CP of 
9.5 nanoseconds. Since the clock advances synchronously with program 
exepution, it can be used to time the program to an exact number of CPs. 
However, in such an application, the counting can contain counts from 
other tasks if an interrupt occurs before the end time is read. 

Instructions used with the RTC register are: 

0014j0 RT Sj Enter the RTC register with (Sj) 
072i00 Si RT Transmit (RTC) to Si 

A program reads the CP counter using instruction 072 and resets it with 
instruction 0014j0. Loading or reading the CP counter can occur from 
all CPUs at the same time. If more than one CPU is in monitor mode, the 
software should ensure that only one CPU enters a value into this 
register. 
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Figure 2-5. Shared registers and real-time clock 



INTER-CPU COMMUNICATION AND CONTROL 

Five identical sets of shared registers are used for connnunication and 
control among CPUs. Each set contains eight 24-bit Shared Address (SB) 
registers, eight 64-bit Shared Scalar (ST) registers and 32 1-bit 
Semaphore (SM) registers. 

Each CPU's Cluster Number (CLN) register determines which set of shared 
registers is accessed by a CPU (clustering) . The CLN register is loaded 
from the Exchange Package or, if the CPU is in monitor mode, through 
instruction 0014 j3. 

The CLN register can contain one of six different values. Values 1, 2, 
3, 4, or 5 allow the CPU to access one of the five sets of shared 
registers. Value prevents any access to shared registers by the CPU. 
If the value is 0, instructions regarding the shared registers become 
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no-ops, except for the instructions returning values to Ai or Si, 
which return a zero value. If the CLN registers in more than one CPU are 
set to the same value (1, 2, 3, 4, or 5), then those CPUs share a common 
set of SB, ST, and SH registers. 



Shared Address and Shared Scalar registers 

The Shared Address (SB) and Shared Scalar (ST) registers are used for 
passing address and scalar information from one CPU to another. No 
hardware reservations are made on these registers. Any necessary 
reservations to restrict access to these registers must be handled in the 
software through use of the Semaphore (SM) registers or by shared memory 
design. The single hardware restriction on access to the SB and ST 
registers is that only one read or one write operation can occur in a CP. 

The instructions used with the SB and ST registers are: 

026tj7 hi SBj Transmit (SBj) to Ai 

027ij7 SB J Ai Transmit (hi) to SBj 

072ij3 Si STj Transmit (STj) to Si 

073ij3 STj Si Transmit (Si) to STj 

Semaphore registers 

The Semaphore (SM) registers are used for control among the CPUs. No 
hardware reservations are made on these registers. Loading or reading 
the SM registers or setting or clearing a particular SM register can 
occur at any time from any or all CPUs. 

The test and set instruction (0034 Jk) is the only operation on the SM 
registers including a hardware interlock. This interlock prevents a 
simultaneous test and set operation on the same SH register from more 
than one CPU. The test and set instruction first tests the value of the 
selected SM register. If the value is 0, the instruction issues and sets 
that SM register to a 1. If the value is 1, the instruction holds issue 
until the value is 0. 

When all CPUs in a cluster are holding issue on a test and set 
instruction, a deadlock interrupt can occur. All CPUs with equal cluster 
numbers above belong to the same cluster and must be holding issue on a 
test and set instruction to cause a deadlock interrupt. When that 
happens, all CPUs in the cluster receive deadlock interrupts. If only 
one CPU belongs to a cluster and holds issue on a test and set 
instruction, that CPU receives a deadlock interrupt. No deadlock 
interrupt can occur in cluster (CLN^O) . 

When an interrupt occurs, normally the instructions already in the NIP 
and CIP registers are allowed to issue before the exchange sequence 
starts. If a test and set instruction is holding in the CIP register and 
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an interrupt occurs/ a special exchange start-up sequence is initiated. 
In this case, the instruction in the NIP register and the test and set 
instruction in the CIP register are discarded and the Program Counter (P) 
register is adjusted to point to the discarded test and set instruction. 
The Waiting on Semaphore (WS) flag in the Exchange Package sets, 
indicating a test and set instruction was holding in the CIP register 
when the interrupt occurred. The exchange sequence is then started. 

Instructions used with the SM registers are: 

0034jfe SVLjk 1,TS Test and set, SMjfe 
0036jfe SMjk Clear SMjfe 
0037jfe SMjk 1 Set SMjk 

072t02 Si SM Transmit (SM) to Si 
073^02 SM Si Transmit (Si) to SM 



Shared register and semaphore conflicts 
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A scanner is used to break a tie caused by simultaneous requests for 
access to the Semaphores or Shared registers of any cluster. If there is 
no competition for access, no extra hold issues are generated. For 
example, an 027ij7 holds issue 3 CP, but if there is an access conflict, 
issue holds until a scanner with four slots breaks the tie. A request 
takes 2 CPs to complete; therefore, subsequent requests can be accepted 
every other CP until all requests are resolved. 



CPU INPUT/OUTPUT SECTION 

The input/output section of the mainframe is shared by all Central 
Processing Units (CPUs) . The mainframe supports three channel types 
identified by their maximum transfer rates of 1250 Mbytes per second, 100 
Mbytes per second, and 6 Mbytes per second. 

Two 1250 Mbyte per second channel pairs transfer data between Central 
Memory and a Solid-state Storage Device (SSD) . These channels are 128 
bits wide and use 16 check bits in each direction. A maximum transfer 
rate of over 10 gigabits per second is possible on a 1250 Mbyte per 
second channel. The channel is two parallel 64-bit channels each with 
SECDED; therefore, under certain circumstances the full-width channel can 
correct double errors. 

Four 100 Mbyte per second channel pairs transfer data between Central 
Memory and an I/O Subsystem. A 100 Mbyte per second channel is 64 bits 
wide and uses 8 check bits in each direction. Data words are transferred 
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in blocks of 16 under control of Data Ready and Data Transmit control 
signals. Each 100 Mbyte per second channel has a maximum transfer rate 
of approximately 850 Mbits per second. 

I/O Subsystem communication with the CPUs is over four pairs of control 
channels, each with a maximum transfer rate of 6 Mbytes per second. Each 
6 Mbyte per second channel is 16 bits wide. 

All I/O (including 100 Mbyte and 1250 Mbyte per second channels) uses the 
I/O ports to memory. Access to these ports is controlled by a scanner. 
All CPU memory ports (Ports A, B, and C) have higher priority than the 
I/O ports. 

Channel features of the input/output section are summarized below and 
described in the remainder of this section. 

• Two channel pairs with 1250 Mbytes per second maximum 
transfer rate per channel 

- 128 data bits and 16 check bits in each direction 

• Four channel pairs with 100 Mbytes per second maximum 
transfer rate per channel 

- 64 data bits, 3 control bits, and 8 check bits in each 
direction 

• Four I/O channel pairs, 6 Mbytes per second maximum transfer 
rate per channel 

- Shared control from the CPUs 

- 16 data bits, 3 control bits, and 4 parity bits in each 
direction 

- Lost data detection 

• Channels are divided into four groups; each group contains 
either input or output channels. 

• Channel groups are served equally by memory (each group is 
scanned every 4 CPs) . 

• Channel priority is resolved within channel groups. 
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DATA TRANSFER FOR SOLID-STATE STORAGE DEVICE 

Data is transferred directly between the Solid-state Storage Device (SSD) 
and the mainframe using the 1250 Mbyte per second channels. A 1250 Mbyte 
per second channel is 128 bits wide and is progranuned through software. 
Programming details for the SSD are described in the Solid-state Storage 
Device (SSD) Reference Manual, CRI publication HR-0031. 



DATA TRANSFER FOR I/O SUBSYSTEM 

A 100 Mbyte per second channel pair transfers data between Central Memory 
and the Buffer I/O Processor (BIOP) of the I/O Subsystem. A second 100 
Mbyte per second channel pair transfers data between Central Memory and a 
Disk I/O Processor (DIOP) or Auxiliary I/O Processor (XIOP).'^ Each 
channel is 64 bits wide and handles data at approximately 100 Mbytes per 
second. Each channel uses an additional 8 check bits for single error 
correction/double error detection (SECDED) , as is used in Central Memory. 

The CPU side of a 100 Mbyte per second channel pair uses a pair of 
16-word buffers to stream the data out of Central Memory and another pair 
to stream data into Central Memory. On output, as one buffer block is 
being sent to the I/O Processor (lOP) , the other buffer is filling from 
Central Memory. Similarly, on input, one buffer block is filling from an 
lOP while the other is transmitting to Central Memory. 

At the lOP side of a 100 Mbyte per second channel pair, data passing into 
Local Memory (an I/O Processor's memory) is double buffered and 
disassembled into 16-bit parcels. The channel side passing data from 
Local Memory simply assembles 16-bit parcels into 64-bit words for 
transmission to a CPU. 

An I/O Processor controls a 100 Mbyte per second channel pair linking it 
with Central Memory. The lOP initiates all data transfers on the channel 
and performs all error processing required for the channel. There are no 
CPU instructions for the 100 Mbyte per second channel pair. Programming 
details for the 100 Mbyte per second channels are contained in the I/O 
Subsystem Reference Manual, CRI publication HR-0030. 



6 MBYTE PER SECOND CHANNELS 

Standard control channels for the system are 6 Mbyte per second 
channels. Each 6 Mbyte per second channel has 16-bit asynchronous 
control logic used for front-end interfaces. The instructions used with 
6 Mbyte per second channels follow. 



Software does not currently support data transfer using the 100 
Mbyte per second channel pair to an XIOP. 
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OOlOjk CA,Aj Ak Set the Current Address (CA) register for 

the channel indicated by (Aj) to (Ak) 
and activate the channel. 

OOlljk CL,Aj Ak Set the Limit Address (CL) register for the 

channel indicated by (Aj) to (Ak) . 

0012jk CI,Aj Clear the Interrupt flag and Error flag for 

the channel indicated by (Aj) : 
Output channel k=0; clear MC, k=l; set MC. 
Input channel k=0; no operation, k=l; 
clear held ready. 

033i00 Ai CI Transmit channel number to At. 

033ij0 At CA,Aj Transmit address of channel (Aj) to Ai. 

033ijl Ai CE,Aj Transmit Error flag of channel (Aj) to Ai. 



MULTI-CPU PROGRAMMING 

The 6 Mbyte per second I/O channels can operate from any CPU, and any CPU 
can issue instructions to any of the channels. There is no hardware 
interlock among the CPUs; therefore, software must ensure that only one CPU 
is servicing I/O at a time, while in monitor mode. Instruction 033 is 
independent in nature and can be issued without an interlock. 

The following conditions must be met for an I/O interrupt to occur. 

• No CPU waiting for an exchange 

• No CPU in monitor mode 

• An interrupt is present 

Normally, the interrupt from a 6 Mbyte per second channel is directed 
toward the CPU that last issued a clear interrupt instruction (0012) to 
that channel. However, because an I/O interrupt occurs in only one CPU 
at a time, the following conditions (in priority order) determine the CPU 
toward which the interrupt is directed. Once in monitor mode, a CPU 
should service all I/O interrupts. 

1. All I/O interrupts are directed toward a CPU that has the select 
external interrupt mode set. 

2. If no CPU has selected external interrupts, then interrupts are 
directed toward a CPU holding issue on a test and set instruction. 

3. If neither conditions 1 nor 2 exist or if they exist in all CPUs, 
the interrupt is directed to the CPU that last issued a clear 
interrupt instruction to that channel. 
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6 MBYTE PER SECOND CHANNEL OPERATION 

Input and output channels access Central Memory directly. Input channels 
store external data in memory and output channels read data from memory. 
A primary task of a channel is to convert 64-bit Central Memory words 
into 16-bit parcels or 16-bit parcels into 64-bit Central Memory words. 
Four parcels make up one Central Memory word with bits of the parcels 
assigned to memory bit positions as shown in table 2-1. In both input 
and output operations, parcel is always transferred first. 

Each input or output channel has a data channel (4 parity bits, 16 data 
bits, and 3 control lines) , a 64-bit assembly or disassembly register, a 
channel Current Address (CA) register, and a channel Limit Address (CL) 
register. 

Three control signals (Ready, Resume, and Disconnect) coordinate the 
transfer of parcels over the channels. In addition to the three control 
signals, the output channel of a pair has a Master Clear line. Appendix 
B describes the signal sequence of a 6 Mbyte per second channel. 

I/O interrupts can be caused by the following: 

• On all output channels, if (CA) becomes equal to (CL) , then the 
resume for the last parcel transmitted sets interrupt. 

• External device disconnect is received on any input channel and 
channel is active. 

• Channel error condition occurs (described later in this 
section) . 

The number of the channel causing an interrupt can be determined by 
using instruction 033, which reads into Ai the highest priority 
channel number requesting an interrupt. The lowest numbered channel 
has the highest priority. The interrupt request continues until 
cleared by the monitor program when an interrupt from the next highest 
priority channel, if present, is sensed. All interrupts are available 
through instruction 033 to all CPUs. Channel numbers for 6 Mbyte per 
second channels are lOs through IIq (10/11, 12/13, 14/15, and 
16/17 - even for input, odd for output) . 



INPUT CHANNEL PROGRAMMING 

To Start an input operation, the CPU program (see figure 2-6) : 

1. Sets the channel limit address to the last word address + 1 
(LWA+1), and 

2. Sets the channel current address to the first word address (FWA) . 
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Table 2-1. Channel word assembly/disassembly 



Characteristic 


Bit position 


Number 


Comment 






of bits 




Channel data bits 


215-20 


16 


Four 4-bit groups 


Channel parity bits 




4 


One per 4-bit group 


CRAY X-MP word 


263_20 


64 




Parcel 


263_248 


16 


First in or out 


Parcel 1 


247_232 


16 


Second in or out 


Parcel 2 


231.216 


16 


Third in or out 


Parcel 3 


2I5-2O 


16 


Fourth in or out 



Setting the current address causes the Channel Active flag to set. The 
channel is then ready to receive data. When a 4-parcel word is assembled, 
the word is stored in memory at the address contained in the CA register. 
When the word is accepted by memory, the current address is advanced by 1. 

An external transmitting device sends a Disconnect signal to indicate end 
of a transfer. When the Disconnect signal is received, the Channel 
Interrupt flag sets and a test is performed to check for a partially 
assembled word. If the partial word is found, the valid portion of the 
word is stored in memory and the unreceived, low-order parcels are stored 
as zeros. 

The Interrupt flag sets when a Disconnect signal is received or when the 
Channel Error flag is set. 



INPUT CHANNEL ERROR CONDITIONS 

Input channel error conditions can occur at a parcel level (parity 
error). When a parcel in error occurs, the Parity Fault flag sets 
immediately. The Parity Fault flag does not generate an interrupt, it is 
saved and sets the Error flag when a disconnect occurs. Therefore, the 
program should check the state of the Error flag when an interrupt is 
honored. All parcels stored after the error are zeroed. 

If a Ready signal is received when the channel is not active, the Ready 
condition is held until the channel is activated. At this time, a Resume 
signal is sent. No Error flag is set and no interrupt request is 
generated. Since the Ready condition is held when the channel is 
inactive, it is s<»netimes advantageous to be able to clear this Ready 
signal before setting up the channel, especially on a deadstart or a 
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Figure 2-6. Basic I/O program flowchart 



resynchronization of the channel after an error. The Ready signal can be 
cleared by using instruction 0012jl to input channel (Ai) , clearing any 
Ready signal being held before issue of instruction 0012 jl. 



OUTPUT CHANNEL PROGRAMMING 

To start an output operation, the CPU program: 

1. Sets the channel limit address to the last word address + 1 
(LWA+1), and 

2. Sets the channel current address to the first word address (FWA) . 

Setting the current address causes the Channel Active flag to be set. 
The channel reads the first word from memory addressed by the contents of 
the CA register. When the word is received from memory, the channel 
advances the current address by 1 and starts the data transfer. 

After each word is read from memory and the current address is advanced, 
the limit test is made, comparing the contents of the CA register and the 
CL register. If they are equal, the operation is complete as soon as the 
last parcel transfer is finished. 
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The Interrupt flag also sets if an error is detected. The only error 
that an output channel detects is a Resume signal received when the 
channel is inactive. No external response is generated. 



PROGRAMMED MASTER CLEAR TO EXTERNAL DEVICE 

The system can send a Master Clear signal to an external device through 
the output channel. The external Master Clear sequence is as follows. 

1. 0012jfe Clears input channel to ensure external activity on the 

channel pair has stopped 

2. 0012jl Clears output channel to ensure CPU activity on the channel 

pair has stopped. Set Master Clear. 

3. Delay 1 Device dependent; determines the duration of the Master 

Clear signal. 

4. 0012j0 Clears the output channel. This turns off the Master Clear 

signal. 

5. Delay 2 Device dependent; allows time for initialization activities 

in the attached device to con^lete. 

For Cray Research, Inc. , front-end interfaces, delays 1 and 2 should each 
be a minimum of 80 CPs. 



ACCESS TO CENTRAL MEMORY 

Each CPD has one I/O port to memory. Channels are divided into four 
groups and scanned to allow access to memory. Each of the four channel 
groups shown below is assigned a time slot (figure 2-7) that is scanned 
for a memory request once every 4 CPs. The channel listed first in each 
group has the highest priority. During the next 3 CPs, the scanner 
allows requests from the other three channel groups. Therefore, an I/O 
memory request can occur every CP. The scanner stops for all memory 
conflicts caused by an I/O reference and also stops for a block (100 
Mbyte per second channel) reference while a buffer is referencing, 
maximum 16 words (figure 2-8) . 

Channels A, B, C, and D are 100 Mbyte per second channels. Channels 6 
and 7 are 1250 Mbyte per second channels. Channels 10 through 17 are 6 
Mbyte per second channels. 
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An I/O memory request can be locked out by an exchange sequence or 
instruction fetch sequence. 



MEMORY BANK CONFLICTS 

Memory bank conflicts are tested for CPU scalar, vector, and I/O memory 
references. When an exchange sequence or instruction fetch sequence is 
in progress, all other memory references for the CPU pair are locked out. 

Each memory bank can accept a new request every 4 CPs. To test for a 
memory bank conflict, the 6 low-order bits of the memory address are 
checked against Bank Busy conflicts and other memory references. The 
bank is busy for 4 CPs on a reference. 



I/O MEMORY CONFLICTS 

Before testing for a memory bank conflict, a check is made to ensure no 
exchange sequence or instruction fetch sequence is in progress. If 
either of these conditions exists, the I/O request is held. The 6 
low-order address bits are tested against Bank Busy conflicts and other 
memory references. If a bank being referenced is busy, the reference is 
held and the scanner is stopped. 
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Figure 2-7. Channel I/O control (shown for CPU 0) 
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Figure 2-8. Input/output data paths (for CPD 0) 
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I/O MEMORY REQUEST CONDITIONS 

The following conditions must be present for an I/O memory request to 
be processed: 

• I/O request 

• Bank not busy 

• No simultaneous conflicts with other memory ports 

• No fetch request within the CPU pair 

• No exchange sequence within the CPU pair 

I/O MEMORY ADDRESSING 

All I/O memory references are absolute. The CA and CL registers are 
24 bits, allowing I/O access to all of memory. Setting of the CA and 
CL registers is limited to monitor mode. I/O memory reference 
addresses are not checked for range errors. 



HR-0097 2-23 



CPU CONTROL SECTION 



INTRODDCTION 

All CPUs have identical # independent control sections containing 
registers and instruction buffers for instruction issue and control. A 
control section uses an exchange mechanism for switching instruction 
execution from program to program. These registers and buffers and the 
exchange mechanism are described in this section. Memory field 
protection, programmable clock, and deadstart sequence are also described. 



INSTRUCTION ISSUE AND CONTROL 

The registers and instruction buffers involved with instruction issue and 
control are described in the following paragraphs. Figure 3-1 
illustrates the general flow of instruction parcels through the registers 
and buffers. 
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Figure 3-1. Instruction issue and control elements 
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PROGRAM ADDRESS REGISTER 

The 24-bit Program Address (P) register indicates the next parcel of 
program code to enter the Next Instruction Parcel (NIP) register. The 
high-order 22 bits of the P register indicate the word address for the 
program word in memory relative to the base address. (Program size is 
limited to 4 million words.) The low-order 2 bits indicate the parcel 
within the word. Except on a branch instruction when the branch is taken 
or on an exchange, the contents of the P register are advanced 1 when an 
instruction parcel enters the NIP register. 

New data enters the P register on an instruction branch or on an exchange 
sequence. (The exchange sequence is described under Exchange Mechanism 
later in this section.) The contents of P are then advanced sequentially 
until the next branch or exchange sequence. The value in the P register 
is stored directly into the terminating Exchange Package during an 
exchange sequence. 

The P register is not master cleared. The value stored in P might not be 
accurate during the deadstart sequence. 



NEXT INSTRUCTION PARCEL REGISTER 

The 16-bit Next Instruction Parcel (NIP) register holds a parcel of 
program code before it enters the Current Instruction Parcel (CIP) 
register. 

The NIP register is not master cleared. An undetermined instruction can 
issue during the master clear interval before the interrupt condition 
blocks data entry into the NIP register. 



CURRENT INSTRUCTION PARCEL REGISTER 

The 16-bit Current Instruction Parcel (CIP) register holds the 
instruction waiting to issue. The term issue indicates the transition 
of an instruction in CIP to its execution phase. If an instruction is a 
2-parcel instruction, the CIP register holds the first parcel of the 
instruction and the Lower Instruction Parcel (LIP) register holds the 
second parcel. Issue of an instruction in CIP can be delayed until 
conflicting operations have been completed. Data arrives at the CIP 
register from the NIP register. Indicators making up the instruction are 
distributed to all modules having mode selection requirements when the 
instruction issues. 

The control flags associated with the CIP register are master cleared; 
the register itself is not. An undetermined instruction can issue during 
the master clear sequence. 
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LOWER INSTRUCTION PARCEL REGISTER 

The 16-bit Lower Instruction Parcel (LIP) register holds the second 
parcel of a 2-parcel instruction at the time the first parcel of the 
2-parcel instruction is in the CIP register. 



INSTRUCTION BUFFERS 

A CPU has four instruction buffers, each can hold 128 consecutive 16-bit 
instruction parcels (figure 3-2) . Instruction parcels are held in the 
buffers before being delivered to the NIP or LIP registers. 
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Figure 3-2. Instruction buffers 



The beginning instruction parcel in a buffer always has a word address 
that is a multiple of 403 (a parcel address that is a multiple of 
20O3) allowing the entire range of addresses for instructions in a 
buffer to be defined by the high-order 17 bits of the parcel address. 
Each buffer has a 17-bit Beginning Address register containing this value. 



HR-0097 



3-3 



The Beginning Address registers are scanned each CP. If the high-order 
17 bits of the P register match one of the beginning addresses, an 
in-buffer condition exists and the proper instruction parcel is selected 
from that instruction buffer. An instruction parcel to be executed 
normally is sent to the NIP. However, the second parcel of a 2-parcel 
instruction is blocked from entering the NIP register and is sent to the 
LIP register instead. The second parcel of the 2-parcel instruction 
becomes available when the first parcel issues from the CIP register. At 
the same time, an all-zero parcel is entered into the NIP register. 

On an in-buffer condition, if the instruction is in a different buffer 
than the previous instruction, a change of buffers occurs requiring a 
2-CP delay of the instruction reaching the NIP register. 

An out-of-buffer condition exists when the high-order 17 bits of the P 
register do not match any instruction buffer beginning address. When 
this condition occurs, instructions must be loaded from memory into one 
of the instruction buffers before execution can continue. A 2-bit 
counter determines the instruction buffer receiving the instructions. 
Each out-of-buffer condition causes the counter to be incremented by 1 so 
that the buffers are selected in rotation. 

Buffers are loaded from memory at the rate of eight words per CP. The 
first group of 32 parcels delivered to the buffer always contains the 
next instruction required for execution. For this reason, the branch 
out-of-buffer time is 16 CPs for 64-bank memories, providing memory is 
not busy (if busy, the branch fetch is delayed until the busy is 
resolved) . Once the fetch proceeds, the remaining groups arrive at a 
rate of 32 parcels per CP and circularly fill the buffer. 

An exchange sequence voids the instruction buffers, preventing a match 
with the P register and causing the buffers to be loaded as needed. 

Forward and backward branching is possible within buffers. Branching 
does not cause reloading of an instruction buffer if the address of the 
instruction being branched to is within one of the buffers. Multiple 
copies of instruction parcels cannot occur in the instruction buffers. 
Because instructions are held in instruction buffers before issue and 
after (until the buffer is reloaded) , self-modifying code should not be 
used. Also, because of independent data and instruction memory 
protection, self-modifying code may be impossible. As long as the 
address of the unmodified instruction is in an instruction buffer, the 
modified instruction in memory is not loaded into an instruction buffer. 

Although optimizing code segment lengths for instruction buffers is not a 
prime consideration when programming a CPU, the number and size of the 
buffers and the capability for forward and backward branching can be used 
to good advantage. Large loops containing up to 512 consecutive 
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instruction parcels can be maintained in the four buffers. An 
alternative is for a main program sequence in one or two of the buffers 
to make repeated calls to short subroutines maintained in the other 
buffers. The program and subroutines remain undisturbed in the buffers 
as long as no out-of-buffer condition or exchange causes reloading of a 
buffer. 



EXCHANGE MECHANISM 

A CPU uses an exchange mechanism for switching instruction execution from 
program to program. This exchange mechanism involves the use of blocks 
of program parameters known as Exchange Packages and a CPD operation 
referred to as an exchange sequence. For the convenience of Cray 
Assembly Language (CAL) programmers, an alternate bit position 
representation is used when discussing the Exchange Package. The bits 
are numbered from left to right with bit assigned to the 2 ^ bit 
position. 



EXCHANGE PACKAGE 

The Exchange Package (figure 3-3) is a 16-word block of data in memory 
associated with a particular computer program. The Exchange Package 
contains the basic parameters necessary to provide continuity from one 
execution interval for the program to the next. 

The Exchange Package contents (table 3-1) are arranged in a 16-word 
block. The exchange sequence swaps data from memory to the operating 
registers and back to memory. This sequence exchanges data in an active 
Exchange Package residing in the operating registers with an inactive 
Exchange Package in memory. The Exchange Address (XA) register address 
of the active Exchange Package specifies the memory address to be used 
for the swap. Data is exchanged and a new program execution interval is 
initiated by the exchange sequence. 

The contents of the B, T, V, VM, SB, ST, and SM registers are not swapped 
in the exchange sequence. Data in these registers must be stored and 
replaced as required by specific coding in the program supervising the 
object program execution or by any program that needs this data. (See 
section 4 for descriptions of the operating registers and the VL 
register.) 
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Figure 3-3. Exchange Package for a 4-processor system 
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Table 3-1. Exchange Package assignments 



Field 


Word 


Bits 


Processor number (FN) 





0-1 


Error type (E) 





2-3 


Syndrome bits (S) 





4-11 


Program Address register (P) 





16-39 


Read mode (R) 


1 


0-1 


Read address (CSB) 


1 


2-5 (CS) ; 
6-11 (B) 


Instruction Base Address (IBA) 


1 


16-33 


Instruction Limit Address (ILA) 


2 


16-33 


Mode register (H) 


1-2 


35-39 


Vector not used (VNU) 


2 





Enable Second Vector Logical (ESVL) 


3 





Flag register (F) 


3 


14-15; 31-39 


Exchange Address register (XA) 


3 


16-23 


Vector Length register (VL) 


3 


24-30 


Enhanced Addressing Mode (EAH) 


4 





Data Base Address (DBA) 


4 


16-33 


Program State (PS) 


4 


35 


Cluster Number (CLN) 


4 


37-39 


Data Limit Address (DLA) 


5 


16-33 


Eight A register contents 


0-7 


40-63 


Eight S register contents 


8-15 


0-63 



Processor number 

The contents of the 2-bit processor number (PN) position in the Exchange 
Package indicates in which CPU the Exchange Package executed. This value 
is not read into the CPU; it is a constant inserted only into a package 
being stored. 



Vector not used (VNU) 

The content of the vector not used (VNU) position in the Exchange Package 
indicates whether or not instructions 076, 077, or 140 through 177 where 
issued during the execution interval. If none of the instructions were 
issued, the bit remains set. If one or more of the instructions issued, 
the bit is cleared. Once cleared, the bit will remain clear until reset 
through a memory store to the dormant Exchange Package. 



Enable Second Vector Logical (ESVL) 

The content of the enable second vector logical (ESVL) position in the 
Exchange Package indicates whether or not the Second Vector Logical unit 
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can be used. If set, instructions 140 through 145 may select the Second 
Vector Logical unit. If clear, the Second Vector Logical unit cannot be 
used; only the Pull Vector Logical unit may be used. 

Enhanced Addressing Mode (EAM) 

The content of the enhanced addressing mode (EAM) position in the 
Exchange Package indicates whether or not address extension will take 
place for address calculations. If set, instructions 100 through 137 
will sign-extend the 22-bit value (Jkm) to 24 bits for address 
calculations (compatible with an 8-million word system) . If clear, all 
CPU memory addresses (not I/O) will have address bits 2^2 and 2^^ 
replaced by data base address bits 2^2 and 2^^, respectively. 

Memory error data 

Bit 36 (interrupt on correctable memory error bit) and bit 38 (interrupt 
on uncorrectable memory error bit) in the M (mode) register determine if 
memory error data is included in the Exchange Package. Error data, 
consisting of four fields of information, appears in the Exchange Package 
if bit 36 is set and correctable memory error is encountered or if bit 38 
is set and an uncorrectable memory error is detected.^ 

Memory error data fields are described below. 

E (Error type) The type of memory error encountered, uncorrectable 

or correctable, is indicated in word 0, bits 2 and 
3 of the Exchange Package. Bit 2 is set for an 
uncorrectable memory error; bit 3 is set for a 
correctable memory error. 

S (Syndrome) The 8 syndrome bits used in detecting a memory data 

error are returned in word 0, bits 4 through 11 of 
the Exchange Package. See section 2 for additional 
information . 

R (Read mode) This field indicates the read mode in progress when 

a memory data error occurred and is in word 1, bits 
and 1 of the Exchange Package. These bits assume 
the following values: 

00 I/O 

01 Scalar (memory references with A or S) 

10 Vector, B, or T 

11 Instruction fetch or exchange 



For multiple bit memory errors, the hardware always sets the 
correctable Memory Error flag in the interrupted Exchange Package. 
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CSB (Read address) The 10-bit CSB field contains the address where a 

memory data error occurred. Word 1, bits 6-11 (B) 
of the Exchange Package contain bits 2^ through 
2^ of the address and can be considered the bank 
address. Word 1, bits 2 through 5 (CS) of the 
Exchange Package contain bits 2^^ through 2^* 
(chip select) of the address. 



EXCHANGE REGISTERS 

Three special registers are instrumental in the exchange mechanism: the 
Exchange Address (XA) register, the Mode (M) register, and the Flag (F) 
register. These three registers are described below. 



Exchange Address register 

The 8-bit Exchange Address (XA) register specifies the first word address 
of a 16-word Exchange Package loaded by an exchange operation. The 
register contains the high-order 8 bits of a 12-bit field specifying the 
address. The low-order bits of the field are always 0; an Exchange 
Package must begin on a 16-word boundary. The 12-bit limit requires that 
the absolute address be in the lower 4096 (10,0003) words of memory. 

When an execution interval terminates, the exchange sequence exchanges 
the contents of the registers with the contents of the Exchange Package 
at the beginning address (XA) in memory. 



Mode register 

The 10-bit Mode (M) register contains part of the Exchange Package for a 
currently active program. The M register bits are assigned in words 1 
and 2 of the Exchange Package as follows. 

Word 1 

Bit Description 

35 Waiting for Semaphore (WS) flag; when set, the CPU exchanged 
when a test and set instruction was holding in the CIP 
register. 

36 Floating-point Error Status (FPS) flag; when set, a 
floating-point error has occurred regardless of the state of 
the Floating-point Error Mode flag. 

37 Bidirectional Memory Mode (BDM) flag; when set, block reads 
and writes can operate concurrently. 
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Word 1 (continued) 

Bit Description 

38 Selected for External Interrupts (SEI) flag; when set, this 
CPU is preferred for I/O interrupts. 

39 Interrupt Monitor Mode (IMM) flag; when set, enables all 
interrupts in monitor mode except PC, MCU, I/O, and normal 
exit. 

Word 2 

Bit Description 

35 Operand Range Error Mode (lOR) flag; when set, enables 
interrupts on operand range errors. 

36 Correctable Memory Error Mode (ICM) flag; when set, enables 
interrupts on correctable memory data errors. 

37 Floating-point Error Mode (IPP) flag; when set, enables 
interrupts on floating-point errors. 

38 Uncorrectable Memory Error Mode (lUM) flag; when set, enables 
interrupts on uncorrectable memory data errors. 

39 Monitor Mode (MM) flag; when set, inhibits all interrupts 
except memory errors. 

The 10 bits are set selectively during an exchange sequence. 

Word 1, bit 37 (Bidirectional Memory Mode flag) can be set or cleared by 
using instructions 0026 (enable bidirectional Memory transfers) and 0025 
(disable bidirectional Memory transfers) . 

Word 2, bit 35 (Operand Range Error Mode flag) can be set or cleared 
during the execution interval of a program by using instructions 0023 
(enable interrupt on operand range error) and 0024 (disable interrupt on 
operand range error) . 

Word 2, bit 37 (Floating-point Error Mode flag) can be set or cleared 
during the execution interval for a program by using instructions 0021 
(enable interrupt on floating-point error) and 0022 (disable interrupt on 
floating-point error) . 

Word 1, bits 36 and 37 and word 2, bits 35 and 37 can be read with 
instruction 073i01. Word 1, bits 35 and 36 indicate the state of the 
CPU at the time of the exchange. The remaining bits are not altered 
during the execution interval for the Exchange Package and can be altered 
only when the Exchange Package is inactive in storage. 
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Flag register 

The 11-bit Flag (F) register contains part of the Exchange Package for 
the currently active program. This register is located in word 3 and 
contains 11 flags individually identified within the Exchange Package. 
Setting any of these flags interrupts program execution. When one or 
more flags are set, a Request Interrupt signal is sent to initiate an 
exchange sequence. The contents of the F register are stored along with 
the rest of the Exchange Package. The monitor program can analyze the 11 
flags for the cause of the interruption. Before the monitor program 
exchanges back to the package, it must clear the flags in the F register 
area of the package. If any bit remains set, another exchange occurs 
immediately. 

The F register bits are assigned in word 3 of the Exchange Package as 
follows. 

Word 3 



Bit Description 
14 



Interrupt From Internal CPU (ICP) flag; set when the another 
CPU issues instruction 001401. 



15 Deadlock (DL) flag; set when all CPUs in a cluster are 
holding issue on a test and set instruction. 

31 Programmable Clock Interrupt (PCI) flag; set when the 
interrupt countdown counter in the programmable clock equals 
0. The programmable clock is explained later in this section. 

32 MCU Interrupt (MCU) flag; set when the HIOP sends this signal. 

33 Floating-point Error (FPE) flag; set when a floating-point 
range error occurs in any of the floating-point functional 
units and the Enable Floating-point Interrupt flag is set. 
Floating-point functional units are explained in section 4, 
Computation. 

34 Operand Range Error (ORE) flag; set when a data reference is 
made outside the boundaries of the Data Base Address (DBA) 
and Data Limit Address (DLA) registers and the Enable Operand 
Range Interrupt flag is set. Operand range error is 
explained later in this section. 

35 Program Range Error (PRE) flag; set when an instruction fetch 
is made outside the boundaries of the Instruction Base 
Address (IBA) and Instruction Limit Address (ILA) registers. 
Program range error is explained later in this section. 
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Word 3 (continued) 

Bit Description 

36 Memory Error (ME) flag; set when a correctable or 
uncorrectable memory error occurs and the corresponding 
enable memory error mode bit is set in the M register. 

37 I/O Interrupt (lOI) flag; set when a 6 Mbyte channel or the 
1250 Mbyte channel completes a transfer. 

38 Error Exit (EEX) flag; set by an error exit instruction (000). 

39 Normal Exit (NEX) flag; set by a normal exit instruction 
(004). 

Any flag (except the Memory Error flag) can be set in the P register only 
if the active Exchange Package is not in monitor mode. Such flags are 
set only if word 2, bit 39 of the M register is 0. Except for the Memory 
Error flag, if the program is in monitor mode and the conditions for 
setting an F register are present, the flag remains cleared and no 
exchange sequence is initiated. 



Cluster Number register 

The 3-bit Cluster Number (CLN) register determines the CPU's cluster. 
The contents of the CLN register are used to determine which set of SB, 
ST, and SM registers the CPU can access. If the CLN register is 0, then 
the CPU does not have access to SB, ST, or SM registers. The contents of 
the CLN registers in all CPUs are also used to determine the condition 
necessary for a deadlock interrupt. 

Program State register 

The content of the 1-bit Program State (PS) register is manipulated by 
the operating system to represent different program states in the CPUs 
concurrently processing a single program. 

A registers 

The current contents of all A registers are stored in bits 40 through 63 
of words through 7 during exchange. 



S registers 

The current contents of all S registers are stored in bits through 63 
of words 8 through 15 during exchange. 
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Program Address register 

The contents of the Program Address (P) register (address of first 
program instruction not yet issued) are stored in bits 16 through 39 of 
word (maximum program size is 4 million words) . The instruction at 
this location is the first instruction to be issued when this program 
begins again. 



Memory field registers 

Each object program has a designated field of memory for instructions and 
data that is specified by the monitor program when the object program is 
loaded and initiated. All memory addresses contained in the object 
program code are relative to one of two base addresses specifying the 
beginning of the appropriate field, and limited in size. Each object 
program reference to memory is checked against the limit and base 
addresses to determine if the address is within the bounds assigned. 
These field limits are contained in four registers that are saved in the 
Exchange Package. The four registers are: the Instruction Base Address 
(IBA) register, the Instruction Limit Address (ILA) register, the Data 
Base Address (DBA) register, and the Data Limit Address (DLA) register. 
Refer to the subsection on Memory Field Protection later in this section 
for an explanation of the registers. 



ACTIVE EXCHANGE PACKAGE 

An active Exchange Package resides in the operating registers. The 
interval of time when the Exchange Package and the program associated 
with it are active is called the execution interval. An execution 
interval begins with an exchange sequence where the subject Exchange 
Package moves from memory to the operating registers. An execution 
interval ends as the Exchange Package moves back to memory in a 
subsequent exchange sequence. 



EXCHANGE SEQUENCE 

The exchange sequence is the vehicle for moving an inactive Exchange 
Package from memory into the operating registers. At the same time, the 
exchange sequence moves the currently active Exchange Package from the 
operating registers back into memory. This swapping operation is done in 
a fixed sequence when all computational activity associated with the 
currently active Exchange Package has stopped. The same 16-word block of 
memory is used as the source of the inactive Exchange Package and the 
destination of the currently active Exchange Package. Location of this 
block is specified by the content of the XA register and is a part of the 
currently active Exchange Package. The exchange sequence can be 
initiated by deadstart sequence. Interrupt flag set, or program exit. 
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Exchange initiated by deadstart sequence 

The deadstart sequence forces the XA register content to for all CPUs 
and also forces an interrupt in one CPU. These two actions cause an 
exchange using memory address as the location of the Exchange Package. 
The inactive Exchange Package at address then moves into the operating 
registers and initiates a program using these parameters. The Exchange 
Package swapped to address is largely indeterminate because of the 
deadstart operation. New data entered at these storage addresses then 
discards the old Exchange Package in preparation for starting subsequent 
CPUs with an interprocessor interrupt. 

When instruction 0014jl (IP) is issued in the first CPU, the CPU 
associated with processor number j exchanges to address in memory. 
(A set of switches on the mainframe's control panel associates processor 
number with CPU number and selects which CPU is deadstarted first.) 

Exchange initiated by Interrupt flag set 

An exchange sequence can be initiated by setting any one of the Interrupt 
flags in the P register. Setting of one or more flags causes a Request 
Interrupt signal to initiate an exchange sequence. 

Exchange initiated by program exit 

Two program exit instructions initiate an exchange sequence. Timing of 
the instruction execution is the same in either case, the difference is 
determined by which of the two flags is set in the P register. The two 
instructions are: 

000 ERR Error exit 

004 EX Normal exit 

The two exits enable a program to request its own termination. A 
non-monitor (object) program usually uses the normal exit instruction to 
exchange back to the monitor program. The error exit allows for abnormal 
termination of an object program. The exchange address selected is the 
same as for a normal exit. 

Each instruction has a flag in the F register. The appropriate flag is 
set if the currently active Exchange Package is not in monitor mode. The 
inactive Exchange Package called in this case is normally one that 
executes in monitor mode. Flags are checked for evaluation of the 
program termination cause. 

The monitor program selects an inactive Exchange Package for activation 
by setting the address of the inactive Exchange Package in the XA 
register and then executing a normal exit instruction. 
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Exchange sequence issue conditions 

The following are hold issue conditions, execution time, and special 
cases for an exchange sequence. 

Hold conditions: 

• NIP register contains a valid instruction 

• S, V, or A registers busy 

Execution time: 

For 64 banks, 40 CPs; consists of an exchange sequence (24 CPs) and a 
fetch operation (16 CPs) . 

Special cases: 

If a test and set instruction is holding in the CIP register, both 
CIP and NIP registers are cleared and the exchange occurs with the WS 
(Waiting for Semaphore) flag set and the P register pointing to the 
test and set instruction. 



EXCHANGE PACKAGE MANAGEMENT 

Each 16-word Exchange Package resides in an area defined during system 
deadstart. The defined area must lie within the lower 4096 (10,000g) 
words of memory. The package at address is the deadstart monitor 
program's Exchange Package. Other packages provide for object programs 
and monitor tasks. Non-monitor packages lie outside of the field lengths 
for the programs they represent as determined by the base and limit 
addresses for the programs. Only the monitor program has a field defined 
so that it can access all of memory, including Exchange Package areas. 
The defined field allows the monitor progreun to define or alter all 
Exchange Packages other than its own when it is the currently active 
Exchange Package. Since no interlock exists between an exchange sequence 
in a CPU and memory transfers in another CPU, modification of Exchange 
Packages which can be used by another CPU should be avoided, except under 
software controlled situations. 

Proper management of Exchange Packages dictates that a non-monitor 
program always exchanges back to the monitor program that exchanged to 
it. The exchange ensures that the program information is always 
exchanged into its proper Exchange Package. 

For example, the monitor program (A) begins an execution interval 
following deadstart. No interrupts (except memory) can terminate its 
execution interval since it is in monitor mode. Progreun A voluntarily 
exits by issuing a normal exit instruction (004). However, before doing 
so, program A sets the contents of the XA register to point to the user 
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program (B) Exchange Package so that program B is the next program to 
execute. Program A sets the exchange address in program B's Exchange 
Package to point back to program A. 

The exchange sequence to program B causes the exchange address from 
program B's Exchange Package to be entered in the XA register. At the 
same time, the exchange address in the XA register goes to program B's 
Exchange Package area with all other program parameters for program A. 
When the exchange is complete, program B begins its execution interval. 

To illustrate the exchange sequence, assume that while program B is 
executing, an Interrupt flag sets initiating an exchange sequence. Since 
program B cannot alter the XA register, the exit is back to program A. 
Program B's parameters exchange back into its Exchange Package area; 
program A's parameters held in program B's package area during the 
execution interval exchange back into the operating registers. 

Program A, upon resuming execution, determines an interrupt has caused 
the exchange and sets the XA register to call the proper interrupt 
processor into execution. To do this, program A sets XA to point to the 
Exchange Package for the interrupt processing program (C) . Program A 
clears the interrupt and initiates execution of program C by executing a 
normal exit instruction (004). Depending on the operating task, program 
C can execute in monitor mode or in user mode. 

Further information on Exchange Package management is contained in the 
COS EXEC/STP/CSP Internal Reference Manual, publication SM-0040. 



MEMORY FIELD PROTECTION 

At execution time, each object program has a designated field of memory 
for instructions and data. The field limits are specified by the monitor 
program when the object program is loaded and initiated. The fields can 
begin at any word address that is a multiple of 64 (that is, 100 g) and 
can continue to another address that is one less than a multiple of 64. 
The fields can overlap. 

All memory addresses contained in the object program code are relative to 
one of the two base addresses specifying the beginning of the appropriate 
field. An object program cannot read or alter any memory location with 
an absolute address lower than that base address. Each object program 
reference to memory is checked against the limit and base addresses to 
determine if the address is within the bounds assigned. A memory read 
reference beyond the assigned field limits issues and completes, but a 
zero value is transferred from memory. A memory write reference beyond 
the assigned field limits is allowed to issue, but no write occurs. 
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Field limits are contained in four registers: the Instruction Base 
Address (IBA) register, the Instruction Limit Address (ILA) register, the 
Data Base Address (DBA) register, and the Data Limit Address (DLA) 
register. These four registers and flags associated with the field 
limits are described below. 



INSTRUCTION BASE ADDRESS REGISTER 

The Instruction Base Address (IBA) register holds the base address of the 
user's instruction field. An instruction can only be executed by the CPU 
if the absolute address at which the instruction is located is greater 
than or equal to the contents of the current Exchange Package IBA 
register of the program executing. This determination is made at 
instruction buffer fetch time by the CPU. 

The contents of the IBA register are interpreted as the high-order 18 
bits of a 24-bit memory address. The low-order 6 bits of the address are 
assumed to be because of the number of banks, 64 decimal banks. 
Absolute memory addresses for an instruction fetch are formed by adding 
the IBA register to the P register (high-order 22 bits) modulo two to the 
twenty- fourth power. 

A reference to an absolute address less than the address defined by IBA 
can only occur through a jump or branch instruction to an address beyond 
the memory capacity of the machine. 



INSTRUCTION LIMIT ADDRESS REGISTER 

The Instruction Limit Address (ILA) register holds the limit address of 
the user's field. An instruction can only be executed by the CPU if the 
absolute address where it is located is less than the contents of the 
current Exchange Package ILA register of the program executing. This 
determination is made at instruction buffer fetch time by the CPU. 

The contents of the ILA register are interpreted as the high-order 18 
bits of a 24-bit memory address. The low-order 6 bits of the address are 
assumed to be because of the number of banks, 64 (decimal) banks. The 
largest absolute address that can be executed by a program is defined by 
[(ILA) X 2^] - 1. 

If the final absolute address of the instruction buffer fetch as computed 
by the CPU does not fall between the range of addresses contained within 
the currently executing Exchange Package IBA and ILA registers, the CPU 
generates a program range error interrupt. 
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DATA BASE ADDKESS REGISTER 

The Data Base Address (DBA) register holds the base address of the user's 
data field. An operand can only be fetched or stored by the CPU if the 
absolute address where the operand is located is greater than or equal to 
the contents of the current Exchange Package DBA register of the program 
executing. This determination is made each time an operand is fetched or 
stored by the CPU. 

The contents of the DBA register are interpreted as the high-order 18 
bits of a 24-bit memory address. The low-order 6 bits of the DBA 
register are assumed to be 0. Absolute memory addresses for operands are 
formed by adding the DBA register to the modified operand address modulo 
two to the twenty-fourth power. 



DATA LIMIT ADDRESS REGISTER 

The Data Limit Address (DLA) register holds the (upper) limit address of 

the user's data field. An operand can only be fetched or stored by the 

CPU if the absolute address where the operand is located is less than the 

contents of the current Exchange Package DLA register of the program 

executing. This determination is made each time an operand is fetched or 
stored by the CPU. 

The contents of the DBA register are interpreted as the high-order 18 
bits of a 24-bit memory address. The low-order 6 bits of the DBA 
register are assumed to be 0. The largest absolute address that can be 
referenced for data by a program is defined by t(DLA) x 2^] - 1. 

If the final absolute address of the operand as computed by the CPU does 
not fall between the range of addresses contained within the currently 
executing Exchange Package DBA and DLA registers, the CPU generates an 
operand (address) range error interrupt. 



PROGRAM RANGE ERROR 

The Program Range Error flag sets if a memory reference outside the 
boundaries of the IBA and ILA registers is for an instruction fetch. An 
out-of-range memory reference can occur in a non-monitor mode program on 
a branch or jump instruction calling for a program address above or below 
the limits. The Program Range Error flag causes an error condition that 
terminates program execution. The monitor program checks the state of 
the Program Range Error flag and takes appropriate action, perhaps 
aborting the user program. 
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OPERAND RANGE ERROR 

The Operand Range Error flag sets if the Operand Range Error Mode flag is 
set and a memory reference outside the boundaries of the DBA and DLA 
registers is called to read or write an operand for an A, B, S, T, or V 
register and the Operand Range Interrupt Error flag is set. The Operand 
Range Error flag causes an error condition that terminates the user 
program execution. The monitor program checks the state of the Operand 
Range Error flag and takes appropriate action, perhaps aborting the user 
program. 



PROGRAMMABLE CLOCK 

The programmable clock can be used to accurately measure the duration of 
intervals. Intervals selected under monitor program control generate a 
periodic interrupt. The clock frequency is 105 Mhz. Intervals from 9.5 
nanoseconds to approximately 40.8 seconds are possible. Intervals 
shorter than 100 microseconds are not practical due to the monitor 
overhead involved in processing the interrupt. Supporting the 
programmable clock are the Interrupt Interval (II) register, the 
Interrupt Countdown (ICD) counter, and four monitor mode instructions. 



INSTRUCTIONS 

Four monitor mode instructions support the programmable clock: 

0014j4 PCI Sj Enter Interrupt Interval (II) register with 

(Sj). 

001405 CCI Clear the programmable clock interrupt 

request. 

001406 ECI Enable the programmable clock interrupt 

request. 

001407 DCI Disable the programmable clock interrupt 

request. 



INTERRUPT INTERVAL REGISTER 

The 32-bit Interrupt Interval (II) register can be loaded with a binary 
value equal to the number of CPs that are to elapse between programmable 
clock interrupt requests. The interrupt interval is transferred from the 
low-order 32 bits of the Sj register into the II register and the ICD 
counter when instruction 0014j4 is executed. 
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This value is held in the II register and is transferred to the ICD 
counter each time the counter reaches and generates an interrupt 
request. The content of the II register is changed only by another 
instruction 001 4j4. 



INTERRUPT COUmTDOmi COUNTER 

The 32-bit Interrupt Countdown (ICD) counter is preset to the contents of 
the II register when instruction 0014j4 is executed. This counter runs 
continuously but counts down, decrementing by 1 each CP until the content 
of the counter is 0. The ICD sets the programmable clock interrupt 
request and samples the interval value held in the II register. The ICD 
repeats the countdown to zero cycle, setting the programmable clock 
interrupt request at regular intervals determined by the interval value. 

When the programmable clock interrupt request is set, it remains set 
until a clear programmable clock interrupt request is executed. A 
programmable clock interrupt request can be set only after the enable 
programmable clock interrupt request is executed. A programmable clock 
interrupt request causes an interrupt only when not in monitor mode. A 
request set in monitor mode is held until the system switches to user 
mode. 



CLEAR PROGRAMMABLE CLOCK INTERRUPT REQUEST 

Following a program interrupt interval, an active programmable clock 
interrupt request can be cleared by executing instruction 001405. 

Following any deadstart, the monitor program should ensure the state of 
the programmable clock interrupt by issuing instructions 001405 and 
001407. 



PERFORMANCE MONITOR 

The system contains a set of eight performance counters to track certain 
hardware related events that can be used to indicate relative 
performance. The events that can be tracked are the number of specific 
instructions issued, hold issue conditions, the number of fetches, 
references, etc., and are selected through instruction OOlSj'O. Refer 
to Appendix C for complete information on performance monitoring. 
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DEADSTART SEQUENCE 

The deadstart sequence of operations starts a program running in the 
mainframe after power has been turned off and then turned on again or 
whenever the operating system is to be reinitialized in the mainframe. 
All registers in the machine, all control latches, and all words in 
memory should be considered invalid after power has been turned on. The 
following sequence of operations to begin the program is initiated by the 
I/O Subsystem. 

1. Turn on Master Clear signal. 

2. Turn on I/O Clear signal. 

3. Turn off I/O Clear signal. 

4. Load memory via I/O Subsystem. 

5. Turn off Master Clear signal. 

The Master Clear signal halts all internal computation and forces 
critical control latches to predetermined states. The I/O Clear signal 
clears the input Channel Address register of the MCU channel and 
activates the MCU input channel. All other input channels remain 
inactive. The I/O Subsystem then loads an initial Exchange Package and 
monitor program. The Exchange Package must be located at address in 
memory. Turning off the Master Clear signal initiates the exchange 
sequence to read this package and to begin execution of the monitor 
program in CPU (Hl=0) . 

The other CPUs remain in a master->cleared state until instruction 
0014 jl (IP) is issued in the CPU with PN»0. Then the CPU with EN=j 
exchanges to address in memory. 

Because the exchange of CPU overwrites the contents of the inactive 
Exchange Package at address 0, CPU must reinitialize the Exchange 
Package at address before allowing other CPUs to start. (Any CPU can 
be started first by using a switch on the control panel.) Subsequent 
actions are dictated by the design of the operating system. 
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CPU COMPUTATION SECTION 



INTRODUCTION 

Each CPU contains an identical, independent computation section. A 
computation section consists of operating registers and functional units 
associated with three types of processing: address, scalar, and vector. 
Address processing operates on internal control information such as 
addresses and indexes and has two levels of 24-bit registers and two 
integer arithmetic functional units. Scalar and vector processing are 
performed on data. 

A vector is an ordered set of elements. A vector instruction operates on 
a series of elements repeating the same function and producing a series 
of results. Scalar processing starts an instruction, handles one operand 
or operand pair, then produces a single result. 

The main advantage of vector over scalar processing is eliminating 
instruction start-up time for all but the first operand. Scalar 
processing has two levels of 64-bit scalar registers, four functional 
units dedicated solely to scalar processing, and three floating-point 
functional units shared with vector operations. Vector processing has a 
set of 64-element registers of 64 bits each, five functional units 
dedicated solely to vector applications, and three floating-point 
functional units supporting both scalar and vector operations. 

Address information flows from Central Memory or from control registers 
to address registers. Information in the address registers is 
distributed to various parts of the control network for use in 
controlling the scalar, vector, and I/O operations. The address 
registers can also supply operands to two integer functional units. The 
units generate address and index information and return the result to the 
address registers. Address information can also be transmitted to 
Central Memory from the address registers. 

Data flow in a confutation section is from Central Memory to registers 
and from registers to functional units. Results flow from functional 
units to registers and from registers to Central Memory or back to 
functional units. Data flows along either the scalar or vector path 
depending on the processing mode. An exception is that scalar registers 
can provide one required operand for vector operations performed in the 
vector functional units. 
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Integer or floating-point arithmetic operations are performed in the 
computation section. Integer arithmetic is performed in twos complement 
mode. Floating-point quantities have signed magnitude representation. 

Floating-point instructions provide for addition, subtraction, 
multiplication, and reciprocal approximation. The reciprocal 
approximation instructions provide for a floating-point divide operation 
using a multiple instruction sequence. These instructions produce 64-bit 
results (1-bit sign, 15-bit exponent, and 48-bit normalized coefficient) . 

Integer or fixed-point operations are integer addition, integer 
subtraction, and integer multiplication. Integer addition and 
subtraction operations produce either 24-bit or 64-bit results. An 
integer multiply operation produces a 24-bit result. A 64-bit integer 
multiply operation is done through a software algorithm using the 
floating-point multiply functional unit to generate multiple partial 
products. These partial products are then shifted and merged to form the 
full 64-bit product. No integer divide instruction is provided; the 
operation is accomplished through a software algorithm using 
floating-point hardware. 

The instruction set includes Boolean operations for OR, AND, equivalence, 
and exclusive OR and for a mask-controlled merge operation. Shift 
operations allow the manipulation of either 64-bit or 128-bit operands to 
produce 64-bit results. With the exception of 24-bit integer arithmetic, 
most operations are implemented in vector and scalar instructions. The 
integer product is a scalar instruction designed for index calculation. 
Full indexing capability allows the programmer to index throughout memory 
in either scalar or vector modes. The index can be positive or negative 
in either mode. Indexing allows matrix operations in vector mode to be 
performed on rows or the diagonal as well as conventional column-oriented 
operations. 

Population and parity counts are provided for both vector and scalar 
operations. An additional scalar operation is the leading zero count. 

Characteristics of a CPU computation section are summarized below. 

• Integer and floating-point arithmetic 

• Twos complement integer arithmetic 

• Signed magnitude floating-point arithmetic 

• Address, scalar, and vector processing modes 

• Fourteen functional units 

• Eight 24-bit address (A) registers 

• Sixty-four 24-bit intermediate address (B) registers 

• Eight 64-bit scalar (S) registers 

• Sixty-four 64-bit intermediate scalar (T) registers 

• Eight 64-element vector (V) registers, 64 bits per element 
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OPERATING REGISTERS 

Operating registers , a primary programmable resource of a CPU, enhance 
the speed of the system by satisfying heavy demands for data made by the 
functional units. A single functional unit can require one to three 
operands per clock period (CP) to perform the necessary functions and can 
deliver results at a rate of one per CP. Multiple functional units can 
be used concurrently. 

A CPU has three primary and two intermediate sets of registers. The 
primary sets of registers are address, scalar, and vector, designated in 
this manual as A, S, and V, respectively. These registers are considered 
primary because functional units can access them directly. 

For the A and S registers, an intermediate level of registers exists 
which is not accessible to the functional units but acts as a buffer for 
the primary registers. Block transfers are possible between these 
registers and Central Memory so that the number of memory reference 
instructions required for scalar and address operands is greatly 
reduced. The intermediate registers that support the A registers are 
referred to as B registers. The intermediate registers that support S 
registers are referred to as T registers. 



ADDRESS REGISTERS 

Figure 4-1 illustrates registers and functional units used for address 
processing. The two types of address registers are designated A 
registers and B registers and are described in the following paragraphs. 



A REGISTERS 

Eight 24-bit A registers serve a variety of applications but are 
primarily used as address registers for memory references and as index 
registers. They provide values for shift counts, loop control, and 
channel I/O operations and receive values of population count and leading 
zeros count. In address applications, A registers index the base address 
for scalar memory references and provide both a base address and an 
address increment for vector memory references. 

The address functional units support address and index generation by 
performing 24-bit integer arithmetic on operands obtained from A 
registers and by delivering the results to A registers. 
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Figure 4-1. Address registers and functional units 



Data is moved directly between Central Memory and A registers or is 
placed in B registers. Placing data in B registers allows buffering of 
the data between A registers and Central Memory. Data can also be 
transferred between A and S registers and between A and Shared Address 
(SB) registers. 

The Vector Length (VL) register and Exchange Address (XA) register are 
set by transmitting a value to them from an A register. The VL register 
can also be transmitted to an A register. (The VL register is described 
under Vector Control Registers later in this section.) 
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when an issued instruction delivers new data to an A register, a 
reservation is set for that register. The reservation prevents issue o£ 
instructions that use the register until the new data is delivered. 

In this manual f the A registers are individually referred to by the 
letter A followed by a number ranging from through 7. Instructions 
reference A registers by specifying the register number as the h, i, 
j, or k designator as described in section 5. 

The only register implicitly referenced is the AO register as illustrated 
in the following instructions: 



OlOiJkm JAZ exp 

Ollijkm JAN exp 

012ijkm JAP exp 

QlZijkm JAM exp 

034ijk Bjk,Ai ,A0 

035ijk ,A0 Ujk.hi 

036ijk TJkfAi ,A0 

037ijk ,A0 Tjk,Ai 

neiok vi ,ho,hk 

mOJk ,AO,Afe Vj 



Branch to ijkm if (A0)=0. 

Branch to ijkm if (A0)7«0. 

Branch to ijkm if (AO) is positive; 
includes (A0)=0. 

Branch to ijkm if (AO) is negative. 

Read (A^) words to B register j'k from (AO) . 

Store (At) words at B register Jk to (AO) . 

Read (hi) words to T register jk from (AO) . 

Store (Ai) words at T register jk to (AO) . 

Read (VL) words to Vi from (AO) 
incremented by (fik) . 

Store (VL) words from Vj to (AO) 
incremented by (A^) . 



Section 5 of this manual contains additional information on the use of A 
registers by instructions. 



B REGISTERS 

A computation section contains sixty-four 24-bit B registers used as 
intermediate storage for the A registers. Typically, B registers contain 
data to be referenced repeatedly over a sufficiently long span, making it 
unnecessary to retain the data in either A registers or in Central 
Memory. Examples of uses are loop counts, variable array base addresses, 
and dimensions. 
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Transfer of a value between an A register and a B register requires only 
1 CP. A block of B registers can be transferred to or from Central 
Memory at the maximum rate of one 24-bit value per CP. A reservation is 
made on all B registers during block transfers to and from B registers. 



NOTE 



Other instructions can issue on the CRAY X-MP while a 
block of B registers is being transferred to or from 
Central Memory. 



In this manual, B registers are individually referred to by the letter B 
followed by a 2-digit octal number ranging from 00 through 77. 
Instructions reference B registers by specifying the B register number in 
the Jk designator as described in section 5. 

The only B register implicitly referenced is the BOO register. On 
execution of the return jump instruction, QQlijkm, register BOO is set 
to the next instruction parcel address (P) and a branch to an address 
specified by ijfew occurs. Upon receiving control, the called routine 
conventionally saves (BOO) so that the BOO register is available for the 
called routine to initiate return jumps of its own. When a called 
routine wishes to return to its caller, it restores the saved address and 
executes instruction OOSOjfe. Conventionally, this instruction, which 
is a branch to (Bjfe) , causes the address saved in BjX: to be entered 
into the P register as the address of the next instruction parcel to be 
executed. 



SCALAR REGISTERS 

Figure 4-2 illustrates registers and functional units used for scalar 
processing. The two types of scalar registers are designated S registers 
and T registers and are described in the following paragraphs. 



S REGISTERS 

Eight 64-bit S registers are the principal scalar registers for a CPU 
serving as the source and destination for operands executing scalar 
arithmetic and logical instructions. Scalar functional units perform 
both integer and floating-point arithmetic operations. 
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Figure 4-2. Scalar registers and functional units 



S registers can furnish one operand in vector instructions. Single-word 
transmissions of data between an S register and an element of a V 
register are also possible. 

Data is moved directly between Central Memory and S registers or is 
placed in T registers. This intermediate step allows buffering of scalar 
operands between S registers and Central Memory. Data is also 
transferred between A and S registers, between S and Shared Scalar (ST) 
registers, and between S and Semaphore (SH) registers. 

Other uses of the S registers are the setting or reading of the Vector 
Mask (VH) register or the Real-time Clock (RTC) register or setting the 
Interrupt Interval (II) register. 
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When an issuing instruction delivers new data to an S register, a 
reservation is set for that register preventing issue of instructions 
that read the register until the new data is delivered. 

In this manual, the S registers are individually referred to by the 
letter S followed by a number ranging from through 7. Instructions 
reference S registers by specifying the register number as the i, 3, 
or k designator as described in section 5. 

The only register in^licitly referenced is the SO register as illustrated 
in the following instructions. 



OlHjkm Jsz exp 

015idkm JSN exp 

Oieijkm JSP exp 

onijkm JSM exp 

052ijk so si<exp 

053ijk SO si>exp 



Branch to ijkm if (S0)=0. 

Branch to i^km if (S0);^0. 

Branch to ijkm if (SO) is positive; 
includes (SO)sO. 

Branch to ighn if (SO) is negative. 

Shift (Si) left jfe places to SO. 

Shift (Si) right jk places to SO. 



The 8-bit Status register provides the status of the following flags: 

• Processor Number (PN) 

• Program State (PS) 

• Cluster Number (CN) 

• Floating-point Interrupts Enabled (IFP) 

• Floating-point Error (FPE) 

• Bidirectional Memory Enabled (BDM) 

• Operand Range Interrupts Enabled (lOR) 

Instruction 073 sends the contents of the Status register to an S 
register. 

Section 5 of this manual has additional information on the use of S 
registers by instructions. 



T REGISTERS 

The computation section has sixty-four 64-bit T registers used as 
intermediate storage for the S registers. Data is transferred between T 
and S registers and between T registers and Central Memory. Transfer of 
a value between a T register and an S register requires only 1 CP. 
T registers reference Central Memory through block read and block write 
instructions. Block transfers occur at a maximum rate of one word per 



HR-0097 



4-8 



CP. A reservation is made on all T registers during block transfers to 
and from T registers. 



NOTE 

Other instructions can issue on the CRAY X-MP while a 
block of T registers is being transferred to or from 
Central Memory. 



In this manual, T registers are referred to by the letter T and a 2-digit 
octal number ranging from 00 through 77. Instructions reference T 
registers by specifying the octal number as the jfe designator as 
described in section 5. 



VECTOR REGISTERS 

Figure 4-3 illustrates the registers and functional units used for vector 
operations. Vector registers and Vector Control registers are described 
in the following paragraphs. 



V REGISTERS 

The major computational registers of a CPU are eight V registers, each 
with 64 elements. Each V register element has 64 bits. When associated 
data is grouped into successive elements of a V register, the register 
quantity can be treated as a vector. Exan^les of vector quantities are 
rows or columns of a matrix or elements of a table. Computational 
efficiency is achieved by identically processing each element of a 
vector. Vector instructions provide for the iterative processing of 
successive V register elements. A vector operation always begins when 
operands are obtained from the first element of the operand V registers 
and the result is delivered to the first element of a V register. 
Successive elements are provided each CP and as each operation is 
performed, the result is delivered to successive elements of the result V 
register. The vector operation continues until the number of operations 
performed by the instruction equals a count specified by the content of 
the VL register. 

Contents of a V register are transferred to or from Central Memory in a 
block mode by specifying a first word address in Central Memory, an 
increment or decrement for the Central Memory address or a set of indexes 
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Figure 4-3. Vector registers and functional units 



contained in a separate vector register, and a vector length. The 
transfer then proceeds beginning with the first element of the V register 
at a maximum rate of one word per CP, depending upon bank conflicts. 

Discontinuities in the vector data stream can occur as a result of memory 
conflicts. These discontinuities, although not inhibiting chained 
operations, can appear in the chained operation data stream. Any 
discontinuity in the data stream adds proportionally to the total 
execution time of the vector operation. 

Single-word data transfers are possible between an S register and an 
element of a V register. 

Since many vectors exceed 64 elements, a long vector is processed as one 
or more 64-element segments and a possible remainder of less than 64 
elements. Generally, it is convenient to compute the remainder and 
process this short segment before processing the remaining number of 
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64-element segments. Ecmevet, a programmer can choose to construct the 
vector loop code in a nundaer of ways. The processing of long vectors in 
FORTRAN is handled by the compiler and is transparent to the programmer. 

A V register receiving results can also supply operands to a subsequent 
operation. Using a register as both a result and operand register in two 
different operations allows for the chaining together of two or more 
vector operations and two or more results can be produced per CP. 
Chained operations are detected automatically by a CPU and are not 
explicitly specified by the programmer. A programmer can reorder certain 
code segments to gain as much concurrency as possible in chained 
operations. 

A conflict can occur between vector and scalar operations involving 
floating-point operations and memory access. With the exception of these 
operations, the functional units are always available for scalar 
operations. A vector operation occupies the selected functional unit 
until the vector is processed. 

Parallel vector operations can be processed in two ways: 

• Using different functional units and all different V registers 

• Using the result stream from one V register simultaneously as the 
operand to another operation using a different functional unit 
(chain mode) 

Parallel operations on vectors allow the generation of two or more 
results per CP. Most vector operations use two V registers as operands 
or one S and one V register as operands. Exceptions are vector shifts, 
vector logicals, vector reciprocals, and the load or store instructions. 

In this manual, the V registers are individually referred to by the 
letter V followed by a number ranging from through 7. Vector 
instructions reference V registers by specifying the register number as 
the i, 3, or k designator as described in section 5. 

Individual elements of a V register are designated in this manual by 
decimal numbers ranging from 00 through 63. These appear as subscripts 
to vector register references. For example, V629 i^^^^i^s ^° element 29 
of V register 6. 



NOTE 

Parallel loading and storing of V registers is 
possible; two load operations and one store operation 
can occur simultaneously. 
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V register reservations and chainincf 

Reservation describes the condition of a register in use; that is, the 
register is not available for another operation as a result or as an 
operand register. Each register has two reservation conditions: one 
reserving it as a operand register and one reserving it as a result 
register. During execution of a vector instruction, reservations are 
placed on the operand V registers and on the result V register. These 
reservations are placed on the registers themselves, not on individual 
elements of the V register. 

If a V register is reserved as a result and not as an operand, it can be 
used at any time as an operand and chaining occurs. This flexible 
chaining mechanism allows chaining to begin at any point in the result 
vector data stream. Full chaining occurs if the instruction causing 
chaining is issued before or at the time element of the result arrives 
at the V register. Partial chaining occurs if the instruction issues 
after the arrival of element 0. Thus, the amount of concurrency in a 
chained operation depends upon the relationship between the issue time of 
the chaining instruction and the result vector data stream. 

If a V register is reserved as an operand, it cannot be used as a result 
or operand register until the operand reservation clears. However, a V 
register can be used as both an operand and result in the same vector 
operation. A V register can serve only one vector operation as the 
source of one or both operands. A V register can serve only one vector 
operation as a result. 

No reservation is placed on the VL register during vector processing. If 
a vector instruction employs an S register, no reservation is placed on 
the S register. The S register can be modified in the next instruction 
after vector issue without affecting the vector operation. The length 
and scalar operand (if appropriate) of each vector operation is 
maintained apart from the VL register and S register. Vector operations 
employing different lengths can proceed concurrently. 

The AO and Afe registers in a vector memory reference are treated 
similarly and are available for modification immediately after use. 

******************************************************* 

CAUTION 

Cray Research, Inc., cautions against using a vector 
register as both a result and an operand if 
compatibility between a CRAY-1 and a CRAY X-MP system 
is necessary because vector recursion is not available 
on all Cray Research, Inc., computers. 
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VECTOR CONTROL REGISTERS 

The Vector Length (VL) register and Vector Mask (VM) register provide 
control information needed in the performance of vector operations and 
are described below. 



Vector Length register 

The 7-bit Vector Length (VL) register is set to 1 through lOOs (VL = 
gives VL = lOOg) specifying the length of all vector operations 
performed by vector instructions and the length of the vectors held by 
the V registers. The VL register controls the number of operations 
performed for instructions 140 through 177 and is set to an A register 
value using instruction 0020 or read using instruction 023i01. 



Vector Mask register 

The Vector Mask (VH) register has 64 bits, each corresponding to a word 
element in a V register. Bit 2^^ corresponds to element 0, bit 2*^ to 
element 63. The mask is used with vector merge and test instructions to 
allow operations to be performed on individual vector elements. 

The VM register can be set from an S register through instruction 003 or 
can be created by testing a V register for a condition using instruction 
175. The mask controls element selection in the vector merge 
instructions (146 and 147) . Instruction 073 sends the contents of the VM 
register to an S register. 



FUNCTIONAL UNITS 

Instructions other than simple transmits or control operations are 
performed by specialized hardware known as functional units. Each unit 
implements an algorithm or a portion of the instruction set. Functional 
units have independent logic except for the Reciprocal Approximation, 
Vector Population Count, Floating-point Multiply and Second Vector 
Logical units (described later in this section), which share some logic. 
All functional units can be in operation at the same time. 

A functional unit receives operands from registers and delivers the 
result to a register when the function has been performed. Functional 
units operate essentially in 3-address mode with source and destination 
addressing limited to register designators. 

All functional units perform algorithms in a fixed amount of time; delays 
are impossible once the operands have been delivered to the unit. Time 
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required from delivery of the operands to the functional unit until 
completion of the calculation is called the functional unit time and is 
measured in 9 . 5-nanosecond CPs. 

Functional units are fully segmented. This means a new set of operands 
for unrelated computation can enter a functional unit each CP even though 
the functional unit time can be more than 1 CP. This segmentation is 
possible when information arrives at the functional unit and is held in 
the functional unit or moves within the functional unit at the end of 
every CP. 

Fourteen functional units are identified in this manual and are 
arbitrarily described in four groups: address, scalar, vector, and 
floating-point. Each of the first three groups functions with one of the 
primary register types (A, S, and V) to support the address, scalar, and 
vector modes of processing available in the system. The fourth group, 
floating-point, supports either scalar or vector operations and accepts 
operands from or delivers results to S or V registers. In addition. 
Central Memory acts like a fifteenth functional unit for vector 
operations. 



ADDRESS FUNCTIONAL UNITS 

Address functional units perform 24-bit integer arithmetic on operands 
obtained from A registers and deliver the results to an A register. The 
arithmetic is twos complement. 



Address Add functional unit 

The Address Add functional unit performs 24-bit integer addition and 
subtraction. The unit executes instructions 030 and 031. Addition and 
subtraction are performed in a similar manner. The twos complement 
subtraction for instruction 031 occurs when the ones complement of the 
Afe operand is added to the Aj operand. Then a 1 is added in the 
low-order bit position of the result. No overflow is detected in the 
Address Add functional unit. 

The Address Add functional unit time is 2 CPs. 



Address Multiply functional unit 

The Address Multiply functional unit executes instruction 032 forming a 
24-bit integer product from two 24-bit operands. No rounding is 
performed. The result consists of the least significant 24 bits of the 
product. 
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This functional unit is designed to handle address manipulations not 
exceeding its data capabilities. The programmer must be careful when 
multiplying integers in the functional unit because the unit does not 
detect overflow of the product and significant portions of the product 
could be lost. 

The Address Multiply functional unit time is 4 CPs. 



SCALAR FUNCTIONAL UNITS 

Scalar functional units perform operations on 64-bit operands obtained 
from S registers and, in most cases, deliver the 64-bit results to an S 
register. The exception is the Population/Leading Zero Count functional 
unit which delivers its 7-bit result to an A register. 

Four functional units are exclusively associated with scalar operations 
and are described below. Three functional units are used for both scalar 
and vector operations and are described in the section on Floating-point 
Functional Units. 



Scalar Add functional unit 

The Scalar Add functional unit performs 64-bit integer addition and 
subtraction and executes instructions 060 and 061. Addition and 
subtraction are performed in a similar manner. The twos complement 
subtraction for instruction 061 occurs when the ones complement of the 
Sk operand is added to the Sj operand. Then a 1 is added in the 
low-order bit position of the result. No overflow is detected in the 
Scalar Add functional unit. 

The Scalar Add functional unit time is 3 CPs. 

Scalar Shift functional unit 

The Scalar Shift functional unit shifts the entire 64-bit contents of an 
S register or shifts the double 128-bit contents of two concatenated S 
registers. Shift counts are obtained from an A register or from the jk 
portion of the instruction. Shifts are end off with zero fill. For a 
double shift, a circular shift is effected if the shift count does not 
exceed 64 and the i and j designators are equal and nonzero. 

The Scalar Shift functional unit executes instructions 052 through 057. 
Single-shift instructions (052 through 055) have a functional unit time 
of 2 CPs. Double-shift instructions (056 and 057) have a functional unit 
time of 3 CPs. 
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Scalar Logical functional unit 

The Scalar Logical functional unit performs bit-by-bit manipulation of 
64-bit quantities obtained from S registers. It executes instructions 
042 through 051, the mask, and Boolean instructions. Instructions 042 
through 051 have a functional unit time of 1 CP. 

Scalar Population/Parity /Leading Zero functional unit 

This functional unit executes instructions 026 and 027. Instruction 
026ij0 counts the number of bits in an S register having a value of 1 
in the operand and has a functional unit time of 4 CPs. Instruction 
026ijl returns a 1-bit population parity count (even parity) of the 
Sj register's contents. Instruction 027 counts the number of bits of 
preceding a 1 bit in the operand and has a functional unit time of 3 
CPs. For these instructions, the 64-bit operand is obtained from an S 
register and the 7-bit result is delivered to an A register. 



VECTOR FUNCTIONAL UNITS 

Most vector functional units perform operations on operands obtained from 
one or two V registers or from a V register and an S register. The 
Reciprocal, Shift, and Population/Parity functional units, which require 
only one operand, are exceptions. Results from a vector functional unit 
are delivered to a V register. 

Successive operand pairs are transmitted each CP to a functional unit. 
The corresponding result emerges from the functional unit n CPs later, 
where n is the functional unit time and is constant for a given 
functional unit. The VL register determines the number of operand pairs 
to be processed by a functiohal unit. 

The functional units described in this section are exclusively associated 
with vector operations. Three functional units are associated with both 
vector operations and scalar operations and are described in the 
subsection entitled Floating-point Functional Units. When a 
Floating-point functional unit is used for a vector operation, the 
general description of vector functional units given in the subsection 
applies. 



Vector functional unit reservation 

A functional unit engaged in a vector operation remains busy during each 
CP and cannot participate in other operations. In this state, the 
functional unit is reserved. Other instructions requiring the same 
functional unit will not issue until the previous operation is coinpleted 
(with the exception of instructions 140 to 145, which may use either of 
the vector logical units) . When the vector operation completes, the 
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reservation is dropped and the functional unit is then available for 
another operation. A vector functional unit is reserved for (VL) + 4 CPs. 



Vector Add functional unit 

The Vector Add functional unit performs 64-bit integer addition and 
subtraction for a vector operation and delivers the results to elements 
of a V register. The unit executes instructions 154 through 157. 
Addition and subtraction are performed in a similar manner. For 
subtraction operations (156 and 157) , the Vfe operand is complemented 
before addition and a 1 is added into the low-order bit position of the 
result. No overflow is detected by the unit. 

The Vector Add functional unit time is 3 CPs. 



Vector Shift functional unit 

The Vector Shift functional unit shifts the entire 64-bit contents of a V 
register element or the 128-bit value formed from two consecutive 
elements of a V register. Shift counts are obtained from an A register 
and are end off with zero fill. 

All shift counts are considered positive unsigned integers. If any bit 
higher than 2^ is set, the shifted result is all zeros. 

The Vector Shift functional unit executes instructions 150 through 153. 
The functional unit time is 4 CPs for instruction 152, and the functional 
unit time is 3 CPs for instructions 150, 151, and 153. 



Vector logical functional units 

The CRAY X-MP Series model 48 has two vector logical functional units: a 
Pull Vector Logical unit and a Second Vector Logical unit. 

The Pull Vector Logical unit performs bit-by-bit manipulations of the 
64-bit quantities for instructions 140 through 147, logical operations 
associated with the vector mask instruction 175, and index generation. 
The Second Vector Logical unit performs bit-by-bit manipulations of 
64-bit quantities for instructions 140 through 145 only. 

Since both vector logical units can be used for instructions 140 through 
145, when these instruction issues to the CIP register a selection is 
made to determine which vector functional unit will be used. Once a 
selection has been made, the instruction is committed to using that 
functional unit. 
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Normally, the instructions will attempt to issue first to the Second 
Vector Logical unit and then, if the unit is busy, attempt to issue to 
the Full Vector Logical unit. If both units are busy, the first unit to 
clear is selected. The Second Vector Logical unit may be busy because of 
another instruction or because the unit is disabled, see below. If there 
are other conflicts (register reservations) for the Second Vector Logical 
unit at the time the selection is made, the instructions will issue to 
the Full Vector Logical unit even though the Second Vector Logical unit 
clears before the instruction issues. When the Second Vector Logical 
unit is disabled, the functional unit busy always appears set and causes 
all 140 through 145 instructions to issue to the Full Vector Logical unit. 

When the Second Vector Logical unit is enabled, it shares input and 
output data paths and the same functional unit busy with the 
Floating-point Multiply unit, so they cannot be used simultaneously. 
Also, since the Second Vector Logical unit ties up the Floating-point 
Multiply unit, some codes that rely on floating-point products may run 
slower if the Second Vector Logical unit is enabled. 

The Second Vector Logical unit can be enabled and disabled through 
software by clearing bit of word 3 in the Exchange Package of a user 
program. If the bit is clear, the unit is disabled and only the Full 
Vector Logical unit is available to instructions 140 through 145. 

Because instruction 175 uses the Full Vector Logical unit, it cannot be 
chained with instructions 146 and 147, nor may it be chained with 
instructions 140 through 145 unless the Second Vector Logical unit is 
enabled and the instructions issue through that unit. 

The Full Vector Logical functional unit time is 2 CPs; the Second Vector 
Logical functional unit time is 4 CPs. 



Vector Population/Parity functional unit 

The Vector Population/Parity functional unit counts the 1 bits in each 
element of the source V register. The total number of 1 bits is the 
population count. This population count can be an odd or an even number, 
as shown by its low-order bit. 

Instructions 174'tjl (vector population count) and 174ij2 (vector 
population count parity) use the same operation code as the vector 
reciprocal approximation instruction. Some restrictions for the 
Reciprocal Approximation functional unit also apply for vector population 
instructions (see subsection on Reciprocal Approximation) . The vector 
population count instruction delivers the total population count to 
elements of the destination V register. 

The vector population count parity instruction delivers the low-order bit 
of the count to the destination V register. The Vector Population/Parity 
functional unit time is 5 CPs. 
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FLOATING-POINT FUNCTIONAL UNITS 

Three floating-point functional units perform floating-point arithmetic 
for scalar and vector operations. When executing a scalar instruction, 
operands are obtained from S registers and results are delivered to ai^ S 
register. When executing most vector instructions, operands are obtained 
from pairs of V registers or from an S register and a V register. 
Results are delivered to a V register. An exception is the Reciprocal 
Approximation unit requiring only one input operand. 

Information on floating-point out-of-range conditions is contained in the 
subsection on Floating-point Arithmetic. 

Floating-point Add functional unit 

The Floating-point Add functional unit performs addition or subtraction 
of 64-bit operands in floating-point format and executes instructions 
062, 063, and 170 through 173. A result is normalized even when operands 
are unnormalized. (Normalized floating-point numbers are described in 
the subsection on Floating-point Arithmetic.) Out-of-range exponents are 
detected as described in the subsection on Floating-point Arithmetic. 

Floating-point Add functional unit time is 6 CPs. 

Floating-point Multiply functional unit 

The Floating-point Multiply functional unit executes instructions 064 
through 067 and 160 through 167. These instructions provide for full- 
and half-precision multiplication of 64-bit operands in floating-point 
format and for computing two minus a floating-point product for 
reciprocal iterations. 

The half-precision product is rounded; the full-precision product can be 
rounded or not rounded. 

Input operands are assumed to be normalized. The Floating-point Multiply 
functional unit delivers a normalized result only if both input operands 
are normalized. 

Out-of-range exponents are detected as described in the subsection on 
floating-point arithmetic. However, if both operands have zero 
exponents, the result is considered as an integer product, is not 
normalized, and is not considered out-of-range. This case provides a 
fast method of cc»nputing a 48-bit integer product, although the operands 
in this case must be shifted before the multiply operation. 
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Because the Second Vector Logical functional unit and the Floating-point 
Multiply functional units share input and output data paths, they cannot 
be used simultaneously. A reservation on one is a reservation on the 
other . 

The Floating-point Multiply functional unit time is 7 CPs. 



Reciprocal Approximation functional unit 

The Reciprocal Approximation functional unit finds the approximate 
reciprocal of a 64-bit operand in floating-point format. The unit 
executes instructions 070 and 17HJ0. Since the Vector Population/Parity 
functional unit shares some logic with this unit, the k designator must 
be for the reciprocal approximation instruction to be recognized. 

The input operand is assumed to be normalized and if so the result is 
correct. The high-order bit of the coefficient is not tested but is 
assumed to be a 1. Out-of -range exponents are detected as described 
under Floating-point Arithmetic. 

The Reciprocal Approximation functional unit time is 14 CPs. 



ARITHMETIC OPERATIONS 

Functional units in a CPU perform either twos complement integer 
arithmetic or floating-point arithmetic. 



INTEGER ARITHMETIC 

All integer arithmetic, whether 24 bits or 64 bits, is twos complement 
and is represented in the registers as illustrated in figure 4-4. The 
Address Add and Address Multiply functional units perform 24-bit 
arithmetic. The Scalar Add and the Vector Add functional units perform 
64-bit arithmetic. 

Multiplication of two scalar (64-bit) integer operands is accomplished by 
using the floating-point multiply instruction and one of the two methods 
that follows. The method used depends on the magnitude of the operands 
and the number of bits to contain the product. 

If the operands are nonzero only in the 24 least significant bits, the 
two integer operands can be multiplied by shifting them each left 24 bits 
before the multiply operation. (The Floating-point Multiply functional 
unit recognizes the conditions where both operands have zero exponents as 
a special case.) The Floating-point Multiply functional unit returns the 
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high-order 48 bits of the product of the coefficients as the coefficient 
of the result and leaves the exponent field 0. See figure 4-7. If the 
operand coefficients are generated by other than shifting so the 
low-order 24 bits would be nonzero, the low-order 48 bits of the product 
could have been nonzero, and the high-order 48 bits (the return part) 
could be one larger than expected as a truncation compensation constant 
is always added during a multiply. 

Twos complement integer (24 bits) 
223 20 



Sign 

Twos complement integer (64 bits) 

263 2° 



Sign 

Figure 4-4. Integer data formats 

If the operands are greater than 24 bits, multiplication is done by 
forming multiple partial products and then shifting and adding the 
partial products. 

Division is done by algorithm; the particular algorithm used depends on 
the number of bits in the quotient. The quickest and most frequently 
used method is to convert the numbers to floating-point format and then 
use the floating-point functional units. 



FLOATING-POINT ARITHMETIC 

Floating-point numbers are represented in a standard format throughout 
the CPU. This format is a packed representation of a binary coefficient 
and an exponent (power of two) . The coefficient is a 48-bit signed 
fraction. The sign of the coefficient is separated from the rest of the 
coefficient as shown in figure 4-5. Since the coefficient is signed 
magnitude, it is not complemented for negative values. 
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Binary point 
263 262 2^42^7 20 



Coeff. Exponent Coefficient 

sign 

Figure 4-5. Floating-point data format 

The exponent portion of the floating-point format is represented as a 
biased integer in bits 2°^ through 2^^. The bias that is added to 
the exponents is 400008. "^^^ positive range of exponents is 400008 
through 577778. "^^ negative range of exponents is 377778 through 
20000g. Thus, the unbiased range of exponents is the following (note 
the negative range is one larger) : 

2-200008 through 2+177778 

In terms of decimal values, the floating-point format of the CRAY X-MP 
4-processor allows the accurate expression of numbers to about 15 decimal 
digits in the approximate decimal range of 10~^^^ through lO*^^^^. 

A zero value or an underflow result is not biased and is represented as a 
word of all zeros. 

A negative is not generated by any floating-point functional unit, 
except in the case where a negative is one operand going into the 
Floating-point Multiply functional unit. 

Normalized floating-point numbers, floating-point range errors, 
double-precision numbers, and the addition, multiplication, and division 
algorithms are described in the remainder of this subsection. 

Normalized floating-point numbers 

A nonzero floating-point number is normalized if the most significant bit 
of the coefficient is nonzero. This condition implies the coefficient 
has been shifted as far left as possible and the exponent adjusted 
accordingly. Therefore, the floating-point number has no leading zeros 
in the coefficient. The exception is that a normalized floating-point 
zero is all zeros. 

When a floating-point number is created by inserting an exponent of 
400608 into a 48-bit integer word, the result should be normalized 
before being used in a floating-point operation. Normalization is 
accomplished by adding the unnormalized floating-point operand to 0. 
Since SO provides a 64-bit when used in the Sj field of an 
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instruction, an operand in Sk is normalized using the 062-tOfc 
instruction. Si, which can be Sk, contains the normalized result. 

The noiOk instruction normalizes Vk into V-t. 



Floating-point range errors 

Overflow of the floating-point range is indicated by an exponent value of 
60000 8 or greater in packed format. Detection of the overflow 
condition initiates an interrupt if the Floating-point Mode flag is set 
in the Mode register and monitor mode is not in effect. The 
Floating-point Mode flag can be set or cleared by a user mode program. 

The Cray Operating System (COS) keeps a bit in a table to indicate the 
condition of the mode bit. System software manipulates the mode bit and 
uses the table bit to indicate how the mode should be left for the user. 
Therefore, the user usually needs to put the appropriate bit in the table 
if the user changes the mode. 

Floating-point range error conditions are detected by the floating-point 
functional units as described in the following paragraphs. 

Floating-point Add functional unit - A floating-point add range error 
condition is generated for scalar operands when the larger incoming 
exponent is greater than or equal to 600003. This condition sets the 
Floating-point Error flag with an exponent of 6OOOO3 being sent to the 
result register along with the computed coefficient, as in the following 
example : 

60000. 4xxxxxxxxxxxxxxx Range error 
+57777. 4x xxxxxxxxxxxxxx 
60000. 6xxxxxxxxxxxxxxx Result register 



NOTE 

If the result of an add or subtract operation is less 
than the machine minimum, the error is suppressed (even 
though both operands have exponents greater than or 
equal to 600008) because the machine minimum takes 
precedence in error detection. 
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Floating-point Mul tiply functional unit - Whether or not out-of-range 
conditions occur, and how they are handled, can be determined using the 
exponent matrix shovm in figure 4-6. The exponent of the result, for any 
set of exponents, falls into one of seven unique zones. A description of 
each zone is given below. 



Exponent of Operand 1 




Figure 4-6. Exponent matrix for Floating-point Multiply unit 



Zone 

1 
2 



Description 

Indicates a simple integer multiply; no fault is possible. 

These exponents would result in an underflow condition. It is 
flagged as such, and the result is set to +0. (Multiply by 
is in this group.) 
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Zone Description 



Underflow may occur on this boundary. The final exponent can 
be milQ or 2OOOO3 depending on whether a normalized 
shift is required. If the exponent is 177778 ^"*^ "° 
normalized shift is required, the underflow will not be 
detected, and the coefficient and exponent will not be zeroed 
out. Underflow detection is done on the exponent used with 
the unshifted product coefficient. 

The use of an underflow exponent is allowed if the final 
result is within the range 20OOO3 to 577770. 

This is the normal operand range and normal results are 
produced. 

Overflow is flagged on this boundary. If a normalized shift 
is required, the value should be within bounds with a 57777g 
exponent. However, since overflow is detected using the 
exponent for the unnormalized shift condition (which is 
eoOOOg) , a 6OOOO3 will be inserted in the product as the 
final exponent. 

Within this zone, an overflow fault is flagged and the product 
exponent is set to 60000g. 



NOTE 

If either operand is less than the machine minimum, the 
error is suppressed (even though the other operand can 
be out of range) because the operand that is less than 
the machine minimum takes precedence in error detection. 



Out-of-range conditions are tested before normalizing in the 
Floating-point Multiply functional unit. As shown above, if both 
incoming exponents are equal to 0, the operation is treated as an integer 
multiply. The result is treated normally with no normalization shift of 
the result allowed. The result is a 48-bit quantity starting with bit 
2^^. When using this feature, the operands should be considered as 
24-bit integers in bits 2^^ through 2^^. In figure 4-6, if operand 1 
is 4 and operand 2 is 6, a 48-bit result of 30 3 is produced. Bit 2^^ 
obeys the usual rules for multiplying signs and the result is a sign and 
magnitude integer. Note the form of integers (see figure 4-4) accepted 
by the integer add and subtract and expected by the software is twos 
complement not sign and magnitude. Therefore, negative products must be 
converted. 
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If bits 2 through 2 in operands 1 and 2 of figure 4-7 have any 1 

bits, the product might be one (2 ) too large because a truncation 

compensation constant is added during the multiply process. (The 

following paragraphs discuss the truncation constant and its use.) The 

size of the shaded area in operands 1 and 2 (figure 4-7) does not need to 

be the same for both operands. To get a correct product, the only 

requirement is that the sum of the number of bits in the shaded area is 

48 bits or more. If the sum is more than 48 bits, the binary point in 

the product is the number of places to the left that the sum is in excess 

of 48 (that is, assuming the operand binary points are at the left 

boundary of the shaded areas) . 



Operand 1 



,63 



,47 







sign 



,23 



-04 



pcQdiact i8..oOEr«ct- 



•. i 



Operand 2 







sign 



•06 



Ma9t be ^ to «ni90ir«^ 



Result 







-030 



sign 

Figure 4-7. Integer multiply in Floating-point 
Multiply functional unit 

Floating-point Reciprocal Approximation functional unit - For the 
Floating-point Reciprocal Approximation functional unit, an incoming 
operand with an exponent less than or equal to 200018 °^ greater than 
or equal to eooOOg causes a floating-point range error. The error flag 
is set and an exponent of eooOOg and the computed coefficient are sent 
to the result register. 



Double-precision numbers 

The CPU does not provide special hardware for performing double- or 
multiple-precision operations. Double-precision computations with 95-bit 
accuracy are available through software routines provided by Cray 
Research, Inc. 
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Addition algorithm 

Floating-point addition or subtraction is performed in a 49-bit register 
(figure 4-8) . Trial subtraction of the exponents selects the operand to 
be shifted dovm for aligning the operands. The larger exponent operand 
carries the sign. The coefficient of the number with the smaller 
exponent is shifted right to align with the coefficient of the number 
with the larger exponent. Bits shifted out of the register are lost; no 
roundup takes place. If the sum carries into the high-order bit, the 
low-order bit is discarded and an appropriate exponent adjustment is 
made. All results are normalized and if the result is less than the 
machine minimum, the error is suppressed. 



m. 



48 



discarded 



Figure 4-8. 49-bit floating-point addition 



The Floating-point Add functional unit normalizes any floating-point 
number within the format of the Cray floating-point number system. The 
functional unit right shifts 1 or left shifts up to 48 per result to 
normalize the result. 

One zero operand and one valid operand can be sent to the Floating-point 
Add functional unit, and the valid operand is sent through the unit 
normalized. Concurrently, the functional unit checks for overflow and/or 
underflow; underflow results are not flagged as errors. 



Multiplication algorithm 

The Floating-point Multiply functional unit has the two 48-bit 
coefficients as input into a multiply pyramid (see figure 4-9) . If the 
coefficients are both normalized, then a full product is either 95 bits 
or 96 bits, depending on the value of the coefficients. A 96-bit product 
is normalized as generated. A 95-bit product requires a left shift of 
one to generate the final coefficient. If the shift is done, the final 
exponent is reduced by one to reflect the shift. The following 
discussion and the power of two designators used assumes that the product 
generated is in its final form; that is, no shift was required. On the 
system, the pyramid truncates part of the low-order bits of the 96-bit 
product. To adjust for this truncation, a constant is unconditionally 
added above the truncation. The average value of this truncation is 9.25 
X 2~^^, which was determined by adding all carries produced by all 
possible combinations that could be truncated and dividing the sum by 
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the number of possible combinations. Nine carries are injected at the 
2~ position to compensate for the truncated bits. The effect of the 
truncation without compensation is at most a result coefficient one 
smaller than expected. With compensation, the results range from one too 
large to one too small in the 2~ bit position with approximately 99 
percent of the values having zero deviation from what would have been 
generated had a full 96-bit pyramid been present. The multiplication is 
commutative; that is, A times B equals B times A. 

Rounding is optional where truncation compensation is not. The rounding 
method used adds a constant so that it is 50 percent high (j,25 x 2~^°; 
high) 38 percent of the time and 25 percent low (.125 x 2~*®; low) 62 
percent of the time resulting in near zero average rounding error. In a 
full-precision rounded multiply, 2 round bits are entered into the 
pyramid at bit position 2~ and 2~ and allowed to propagate up the 
pyramid . 

For a half-precision multiply, round bits are entered into the pyramid at 
bit positions 2"^^ and 2"^^. A carry resulting from this entry is 
allowed to propagate up and the 29 most significant bits of the 
normalized result are transmitted back. 

The variation due to this truncation and rounding are in the range: 

-0.23 X 2-48 to +0.57 x 2*48 

or -8.17 X 10-16 to +20.25 x 10-16. 

With a full 96-bit pyramid and rounding equal to one-half the least 
significant bit, the variation would be expected to be: 

-0.5 X 2-48 to +0.5 X 2-48 



Division algorithm 

The system performs floating-point division through reciprocal 
approximation, facilitating hardware implementation of a fully segmented 
functional unit. Because of this segmentation, operands enter the 
reciprocal unit during each CP. In vector mode, results are produced at 
a 1-CP rate and are used in other vector operations during chaining 
because all functional units in the system have the same result rate. 
The reciprocal approximation is based on Newton's method. 



HR-0097 4-28 



PRODUCT BIT DESIGNATION: 

IF SHIFT IS NEEDED j j . 
TO NORMALIZE COEFFICIENT — "2"* 
IF SHIFT IS NOT NEEDED 



TO NORMALIZE COEFFICIENT 




(l) hh - II2 for half-precision round, OO2 for 

full-precision rounded or full-precision unrounded 
multiply 

(2J ff = II2 for full-precision round, OO2 for 

half-precision rounded or full-precision unrounded 
multiply 

(3) Truncation compensation constant, IOOI2 used for all 
multiplies 



Figure 4-9. Floating-point multiply partial-product sums pyramid 



t Bit designations are used in the explanation of the Floating-point 
Multiply functional unit operation. 
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Newton's method - The division algorithm is an application of Newton's 
method for approximating the real roots of an arbitrary equation 
F(x) = 0, for which P(x) must be twice differentiable with a continuous 
second derivative. The method requires making an initial approximation 
(guess), xq, sufficiently close to the true root, x,., being sought 
(see figure 4-10) . For a better approximation, a tangent line is drawn 
to the graph of y = F{x) at the point (xq, F(xo)). The X intercept 
of this tangent line is the better approximation x-^. This can be 
repeated using xj^ to find X2, etc. 



y=F(x) 



(XQ,f{XQ)) 




Figure 4-10. Newton's method 

Derivation of the division algorithm 

A definition for the derivative F' (x) of a function F(x) at point x*^ is 
F'(xt) = limit F(x) - F(xt) 

X-»>X^ X - Xt 

if this limit exists. If the limit does not exist, F(x) is not 
differentiable at the point t. 

For any point x^ near to Xj., 

F{x ) — F(X4.) 
F' (xt) J55 i ^ where as means "approximately equal to". 

Xi - Xt 
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This approximation improves as Xi approaches Xf Let x^ stand for an 
approximate solution and let xt stand for the true answer being sought. 
The exact answer is then the value of x that makes P(x) equal 0. This is 
the case when x=xt, therefore P(xt) in the equation above can be 
replaced by 0, giving the following approximation: 

pi /v^\ P(Xi) 

y'^ti as ±— Approximation (1) 

Xi- Xt 

Notice that xt - xi is the correction applied to an approximate answer, 
Xj^, to give the right answer since x^ + (x^ - x^) equals Xf 
Solving approximation (1) for (x^ - x^) gives: 

x^ - Xi = correction « - ^^^i^ # 

P'(xt) 
that is, - ^^^i^ is the approximate correction. 
F- (Xt) 

If this quantity is substituted into the approximation, then: 

Xt « (Xi + approximate correction) = Xi+j. 
This gives, the following equation: 

X = X. - P(^i) , Equation (1) 

F'(Xi) 

where Xi^^ is a better approximation than Xi to the true value, xt, 
being sought. The exact answer is generally not obtained at once because 
the correction term is not generally exact. However, the operation is 
repeated until the answer becomes sufficiently close for practical use. 

To make use of Newton's method to find the reciprocal of a number B, 
simply use F(x) = (1/x - B) , 

First calculating F' (x) : 

where „, , ^ , ^ n^ i , — x • thus for any point xj j^ 0, 

X " ~ x^' 

F' (x ) =~ — - * Choosing for x, a value near — — 

1 v^ ^ 

*1 
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and applying equation (1) , 



TT 



X2 = «! - -i f 



2 

x^ = x^ + x^ - xJb, 

2 
x^ = 2x^ - x^B = X (2-x B) . 

On the system, Xj^ times the quantity in parentheses is performed by a 
floating-point multiply. 2-X3^B is performed by the reciprocal 
approximation instruction. Xj is the x near 1/6 and is formed by the 
half-precision reciprocal approximation instruction. 

This approximation technique using Newton's method is implemented in the 
system. A hardware table look up provides an initial guess, Xq, to 
start the process. 

Xq(2 - XqB) 1st approximation, II | 

I Done 
xi(2 - xj^B) 2nd approximation, 12 \ in reciprocal 

unit 
X2{2 - X2B) 3rd approximation, 13 j 

X3(2 - X3B) 4th approximation Done with software 

The system's Reciprocal Approximation functional unit performs three 
iterations: II, 12, and 13. II is accurate to 8 bits and is found after 
a table look-up to choose the initial guess, xq. 12 is the second 
Iteration and is accurate to 16 bits. 13 is the final (third) iteration 
answer of the Reciprocal Approximation functional unit, and its result is 
accurate to 30 bits. 

A fourth iteration uses a special instruction within the Floating-point 
Multiply functional unit to calculate the correction term. This 
iteration is used to increase accuracy of the reciprocal unit's answer to 
full precision. A fifth iteration should not be done. 
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The division algorithm that computes S1/S2 to full-precision requires the 
following operations: 

53 = 1/S2 Performed by the Reciprocal Approximation 

functional unit 

54 = (2 - (S3 * S2)) Performed by the Floating-point Multiply 

functional unit in iteration mode 

55 = S4 * S3 Performed by the Floating-point Multiply 

functional unit using full-precision. S5 now 
equals 1/S2 to 48-bit accuracy. 

56 = S5 * SI Performed by the Floating-point Multiply 

functional unit using full-precision rounded 

The reciprocal approximation at step 1 is correct to 30 bits. An 
additional Newton iteration (fourth iteration) at operations 2 and 3 
increases this accuracy to 48 bits. This iteration answer is applied as 
an operand in a full-precision rounded multiply operation to obtain the 
quotient accurate to 48 bits. Additional iterations should not be 
attempted since erroneous results are possible. 



******************************************************* 

CAUTION 

The reciprocal iteration is designed for use once with 
each half-precision reciprocal generated. If the 
fourth iteration (the programmed iteration) results in 
an exact reciprocal or if an exact reciprocal is 
generated by some other method, performing another 
iteration results in an incorrect final reciprocal. 

******************************************************* 



Where 29 bits of accuracy are sufficient, the reciprocal approximation 
instruction is used with the half-precision multiply to produce a 
half-precision quotient in only two operations. 

S3 = 1/S2 Performed by the Reciprocal Approximation 

functional unit 

S6 = SI * S3 Performed by the Floating-point Multiply 

functional unit in half-precision 



HR-0097 4-33 



The 19 low-order bits of the half-precision results are returned as zeros 
with a rounding applied to the low-order bit of the 29-bit result. 

Another method of computing divisions is as follows; 

53 = 1/S2 Performed by the Reciprocal Approximation 

functional unit 

55 = SI * S3 Performed by the Floating-point Multiply 

functional unit 

54 = (2 - (S3 * S2)) Performed by the Floating-point Multiply 

functional unit 

56 = S4 * S5 Performed by the Floating-point Multiply 

functional unit 

A scalar quotient is computed in 29 CPs since operations 2 and 3 issue in 
successive CPs. With this method, the correction to reach a 
full-precision reciprocal is applied after the numerator is multiplied 
times the half-precision reciprocal rather than before. 

A vector quotient using this procedure requires less than four vector 
times since operations 1 and 2 are chained together. This overlaps one 
of the multiply operations. (A vector time is 1 CP for each element in 
the vector.) 



******************************************************* 

CAUTION 

The coefficient of the reciprocal produced by the 
alternate method can be as much as 2 x 2~^^ different 
from the first method described for generating 
full-precision reciprocals. This difference can occur 
because one method can round up as much as twice while 
the other method may not round at all. One round can 
occur while the correction is generated and the second 
round can occur when producing the final quotient. 

Therefore, if the reciprocals are to be compared, the 
same method should be used each time the reciprocals 
are generated. Cray FORTRAN (CPT) uses a consistent 
method and ensures the reciprocals of numbers are 
always the same. 

******************************************************* 
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For example, two 64-element vectors are divided in 3 * 64 CPs plus 
overhead. (The overhead associated with the functional units for this 
case is 38 CPs.) 



lOGICAL OPERATIONS 

Scalar and vector logical units perform bit-by-bit manipulation of 64-bit 
quantities. Operations provide for forming logical products, 
differences, sums, and merges. 

A logical product is the AND function: 

Operand 1 10 10 
Operand 2 110 
Result 10 

An operation similar to the AND function produces the following results: 

Operand 1 10 10 
Operand 2 110 
Result 10 

The logical product (AND) operation is used for masking operations where 
the ones specify the bits to be saved. In this variant of the AND 
function, the zeros specify the bits to be saved (Operand 1 is the mask) . 

A logical sum is the inclusive OR function: 

Operand 1 10 10 
Operand 2 110 
Result 1110 

A logical difference is the exclusive OR function: 

Operand 1 10 10 
Operand 2 110 
Result 110 

A logical equivalence is the exclusive NOR function: 

Operand 1 10 10 
Operand 2 110 
Result 10 1 
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The merge uses two operands and a mask to produce results as follows: 

Operand 1 10101010 

Operand 2 11001100 

Mask 11110000 

Result 10101100 

The bits of operand 1 pass where the mask bit is 1. The bits of operand 
2 pass where the mask bit is 0. 
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CPU INSTRUCTIONS 



INSTRUCTION FOBMAT 

Each instruction used in the con^uter is either a 1-parcel (16-bit) 
instruction or a 2-parcel (32-bit) instruction. Instructions are packed 
four parcels per word. Parcels in a word are numbered through 3 from 
left to right and any parcel position can be addressed in branch 
instructions. A 2-parcel instruction begins in any parcel of a word and 
can span a word boundary. For example, a 2-parcel instruction beginning 
in the fourth parcel of a word ends in the first parcel of the next 
word. No padding to word boundaries is required. Figure 5-1 illustrates 
the general form of instructions. 



First parcel Second parcel 



m 



4 I 3 I 3 I 3 I 3 I 16 Bits 



Figure 5-1. General form for instructions 



Four variations of this general format use the fields differently; two 
forms are 1-parcel formats and two are 2-parcel formats. The formats of 
these four variations are described below. 



1-PARCEL INSTRUCTION FORMAT WITH DISCRETE J AND k FIELDS 

The most common of the 1-parcel instruction formats uses the i, j, 
and k fields as individual designators for operand and result registers 
(see figure 5-2) . The g and h fields define the operation code. The 
i field designates a result register and the j and k fields designate 
operand registers. Some instructions ignore one or more of the i, j, 
and k fields. The following types of instructions use this format. 

• Ar ithmetic 

• Logical 

• Double shift 

• Floating-point constant 
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g h i 3 k 

I 4 I 3 I 3 I 3 I 31 Bits 

Operation Register 
code designators 

Figure 5-2. 1-parcel instruction format 
with discrete j and k fields 



1-PARCEL INSTRUCTION FORMAT WITH COMBINED J AND k FIELDS 

Some 1-parcel instructions use the j and k fields as a coiftbined 6-bit 
field (see figure 5-3) . The g and h fields contain the operation 
code, and the i field is generally a destination register identifier. 
The combined j and k fields generally contain a constant or a B or T 
register designator. The branch instruction 005 and the following types 
of instructions use the 1-parcel instruction format with combined j and 
k fields. 

• Constant 

• B and T register block memory transfer 

• B and T register data transfer 

• Single shift 

• Mask 



Operation 
code 



3k 



I 3 I 3 I 



3 Bits 



Result Constant or 
register register 
designator 

Figure 5-3. 1-parcel instruction format 
with combined j and k fields 



2-PARCEL INSTRUCTION FORMAT WITH COMBINED J, k, AND m FIELDS 

The instruction type for a 22-bit immediate constant uses the combined 
j, k, and m fields to hold the constant. The 7-bit gh field contains 
an operation code, and the 3-bit -i field designates a result register. 
The instruction type using this format transfers the 22-bit jkm constant 
to an A or S register. 
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The instruction type used for scalar memory transfers also requires a 
22-bit jfem field for an address displacement. This instruction type 
uses the 4-bit g field for an operation code, the 3-bit h field to 
designate an address index register, and the 3-bit i field to designate 
a source or result register. (See subsection on Special Register Values.) 

Figure 5-4 shows the two general applications for the 2-parcel instruction 
format with combined j, k, and m fields. 



First parcel 



Second parcel 



m 



1 4 I 3 1 rv 
— -^T^ 



22 



Bits 



Operation Result 
code register 



Constant 



First parcel 



Second parcel 



g h ■ 


I 


* 
3 


k m 




4 13 13 


1 


' 22 


1 


4 > 


I t 


I 








Operation 
code 




Address or 
displacement 





Bits 



Address Source or 
register result register 
used as 
index 

Figure 5-4. 2-parcel instruction format 

with combined j, k, and m fields 



2-PARCEL INSTRUCTION FORMAT WITH CC»IBINED i, J, k, AND m FIELDS 

The 2-parcel instruction type for a branch (figure 5-5) uses the combined 
it 3, k, and m fields to contain the 24-bit address that allows 
branching to an instruction parcel. A 7-bit operation code {glii is 
followed by an tjTyn field. The high-order bit of the i field is clear. 

The 2-parcel instruction type for a 24-bit immediate constant (figure 
5-6) uses the combined i, o, k, and m fields to hold the constant. This 
instruction type uses the 4-bit g field for an operation code and the 
3-bit h field to designate the result address register. The high-order 
bit of the i, field is set. 
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First parcel 



Second parcel 



g 



m 



3 101 * 



T 

1 



Operation 

code Clear 
bit 



22 



Address 



Bits 

T 

Parcel 
select 



Figure 5-5. 2-parcel instruction formats for a branch 
with combined i, j, k, and m fields 



First parcel 



Second parcel 



d 



4 I 3 111 



1- J 
T 



m 



1 



Operation 
code 



T 

1 

Set 
bit 



24 



U Bits 



Constant 



Result 
Register 



Figure 5-6. 2-parcel instruction formats for a 24-bit immediate 
constant with combined i, q, fe, and m fields 



SPECIAL REGISTER VALUES 

If the SO and AO registers are referenced in the j or k fields of an 
instruction, the contents of the respective register are not used; 
instead, a special operand is generated. The special value is available 
regardless of existing AO or SO reservations (and in this case are not 
checked) . This use does not alter the actual value of the SO or AO 
register. If SO or AO is used in the i field as the operand, the 
actual value of the register is provided. The table below shows the 
special register values. 
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Field 


Operand value 


hh, h=o 





hi, i=0 


(AO) 


Aj, J=0 





A/C, fe=0 


1 


Si, i=0 


(SO) 


SJ, J=0 





Sfe, fe=0 


263 



INSTRUCTION ISSDE 

Instructions are read one parcel at a time from the instruction buffers 
and delivered to the Next Instruction Parcel (NIP) register. The 
instruction is then passed to the Current Instruction Parcel (CIP) 
register when the previous instruction issues. An instruction in the CIP 
register issues when conditions in the functional unit and registers are 
such that functions required for execution can be performed without 
conflicting with a previously issued instruction. Instruction parcels 
can issue out of the CIP register at a maximum rate of one per clock 
period. 

Execution times (the time from issue to delivery of data to the 
destination operating registers) are fixed for instructions 000 through 
077, except those that reference memory (instructions 000, 004, branch 
instructions 005 through 017, and block transfer instructions 034 through 
037) . Scalar memory instructions 100 through 137 complete in variable 
lengths of time. Vector operation instructions 140 through 177 con^lete 
in a fixed time if the instructions are not chained to memory fetches. 

Execution times can be affected by instruction QOSAjkr which tests and 
sets the semaphore designated by Jk. If the semaphore is set, 
instruction issue is held until another CPU clears that semaphore. If 
the semaphore is clear, the instruction issues and sets the semaphore. 
If all CPUs in a cluster are holding issue on a test and set, a flag is 
set in the Exchange Package (if not in monitor mode) and an exchange 
occurs. If an interrupt occurs while a test and set instruction is 
holding in the CIP register, a flag is set in the Exchange Package, CIP 
and NIP registers clear, and an exchange occurs with the P register 
pointing to the test and set instruction. 
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Entry to the NIP register is blocked for the second parcel of a 2-parcel 
instruction, leaving NIP blanked. Instead, the parcel is delivered to 
the Lower Instruction Parcel (LIP) register. The zeros in NIP (the 
pseudo second parcel) are transferred to CIP and issued as a do-nothing 
instruction. 

When special register values (AO or SO) are selected by an instruction 
for hh, Aj, Afe, Sj, or Sfe, the normal "hold issue until operand 
ready" conditions do not apply. These values are always immediately 
available. 



INSTRUCTION DESCRIPTIONS 

This section contains detailed information about individual instructions 
or groups of related instructions. Each instruction begins with boxed 
information consisting of the Cray Assembly Language (CAL) syntax format, 
a brief description of each instruction, and the octal code sequence 
defined by the gh fields. The appearance of an n? in a format 
designates an instruction consisting of two parcels. 

Following the boxed information is a more detailed description of the 
instruction or instructions, including a list of hold issue conditions, 
execution time, and special cases. Hold issue conditions refer to those 
conditions delaying issue of an instruction until conditions are met. 

Instruction issue time assumes that if an instruction issues at clock 
period n (CP n) , the next instruction issues at CP « + issue time^ 
if its own issue conditions have been met. 

The following special characters can appear in the operand field 
description of symbolic machine instructions and are used by the 
assembler in determining the operation to be performed. 

+ Arithmetic sum of adjoining registers 

- Arithmetic difference of adjoining registers 

* Arithmetic product of adjoining registers 
/ Division or reciprocal 

# Use ones complement 

> Shift value or form mask from left to right 

< Shift value or form mask from right to left 

& Logical product of adjoining registers 

! Logical sum of adjoining registers 

\ Logical difference of adjoining registers 



f Previous instruction issued 
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In some instructions, register designators are prefixed by the following 
letters, which have special meaning to the assembler. 



F Floating-point operation 

H Half-precision operation 

R Rounded operation 

I Reciprocal iteration 

P Population count 

Q Population count parity 

Z Leading zero count 
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INSTRUCTION 000 



CAL Syntax Description Octal Code 



ERR Error exit 000000 



Instruction 000 is treated as an error condition and an exchange sequence 
occurs. Content of the instruction buffers is voided by the exchange 
sequence. Instruction 000 halts execution of an incorrectly coded 
program branching into an unused area of memory (if memory was 
backgrounded with zeros) or into a data area (if the data is positive 
integers, right-justified ASCII, or floating-point zero) . If monitor 
mode is not in effect, the Error Exit flag in the F register is set. All 
instructions issued before this instruction are run to completion. When 
results of previously issued instructions arrive at the operating 
registers, an exchange occurs to the Exchange Package designated by 
contents of the XA register. The program address stored during the 
exchange on the terminating exchange sequence is the contents of the P 
register advanced by one count (that is, the address of the instruction 
following the error exit instruction) . 



HOLD ISSUE CONDITIONS: Any A, S, or V register reserved 

EXECUTION TIME: Instruction issue, 40 CPs; this time includes an 

exchange sequence (24 CPs) and a fetch operation 
(16 CPs) . 

SPECIAL CASES: None 
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INSTRDCTIONS 0010 - 0013 



CAL Syntax 


Description 


Octal Code 


CA,Aj fJc 


Set the Current Address (CA) register for the 
channel indicated by (Aj) to (Afe) and activate 
the channel 


OOlOjfe 


CL,Aj Afe 


Set the Limit Address (CL) register for the 
channel indicated by (Aj) to (hk) 


OOlljfe 


CI,Aj 


Clear the interrupt flag and error flag for 
the channel indicated by (Aj) ; clear device 
master-clear (output channel) 


0012j0 


MC,Aj 


Clear the interrupt flag and error flag for 
the channel indicated by (Aj) ; set device 
master-clear (output channel) ; clear device 
ready-held (input channel) 


0012jl 


XA A3 


Enter the XA register with (Aj) 


0013^-0 



Instructions 0010 through 0013 are privileged to monitor mode and provide 
operations useful to the operating system. Functions are selected 
through the i designator. Instructions are treated as pass 
instructions if the monitor mode bit is not set. 

When the i designator is 0, 1, or 2, the instruction controls operation 
of the I/O channels. Each channel has two registers directing the 
channel activity. The CA register for a channel contains the address of 
the current channel word. The CL register specifies the limit address. 
In programming the channel, the CL register is initialized first and then 
CA sets, activating the channel. As transfer continues, CA is 
incremented toward CL. When (CA) is equal to (CL) , transfer is con^lete 
for words at initial (CA) through (CL)-l. When the J designator is 
or when the 5 low-order bits of hj are less than 63, the functions 
are executed as pass instructions. Valid channel numbers are 6-17q. 
When the k designator is 0, CA or CL is set to 1. 

When the i designator is 3, the instruction transmits bits 2^^ 
through 2^ of (Aj) to the XA register. When the j designator is 0, 
the XA register is cleared. 

Instruction 0012j0 is used to clear the device Master Clear. For 
instruction 0012, if the k designator is 1 for an output channel, the 
master clear is set; if the k designator is 1 for an input channel, the 
Ready flag is cleared. 
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INSTRUCTIONS 0010 - 0013 (continued) 

HOLD ISSUE CONDITIONS: For instructions 0010 and 0011, Aj or Afe 

reserved (except AO) 

For instructions 0012 or 0013, Aj reserved 
(except AO) 



EXECUTION TIME: 
SPECIAL CASES: 



Instruction issue, 1 CP 

If the program is not in monitor mode, the 
instruction becomes a no-op although all hold 
issue conditions remain effective. 

For instructions 0010, 0011, and 0012: 
If j=0, the instruction is a no-op. 
If k=0, CA or CL is set to 1. 
If 5 low-order bits of (Aj) are less than 
6g, the instruction is a no-op. If the 5 
low-order bits of (Aj) are greater than 
173, undetermined results can occur. (That 
is, 6g through 17g are valid, 20g through 
373 are undetermined, 463 through 573 are 
valid, etc.) 

For instruction 0012: 

The correct priority interrupting channel 
number cannot be read (through instruction 033) 
until 6 CPs after issue of instruction 0012. 

For instruction 0013: 

If j=0, XA register is cleared. 



NOTE 

Because there is no hardware interlock among 
CPUs, it is possible to have more than one CPU 
issuing these instructions at the same time; 
however, undetermined results will occur. 

Software must ensure only one CPU is servicing 
I/O at a time while in monitor mode. 



HR-0097 



5-10 



INSTRUCTION 0014 



CAL Syntax 


Description 


Octal Code 


RT Sj 


Enter the Real-time Clock register with (Sj) 


0014j0 


IP,jl 


Set interprocessor interrupt request of CPUj 


0014j'l 


IP 


Clear received interprocessor interrupt 
request from all other processors 


001402 


CLN 


Cluster number - 


001403 


CLN 1 


Cluster number = 1 


001413 


CLN 2 


Cluster number = 2 


001423 


CLN 3 


Cluster number = 3 


001433 


CLN 4 


Cluster number = 4 


001443 


CLN 5 


Cluster number = 5 


001453 


PCI SJ 


Enter Interrupt Interval (II) register with (Sj) 


0014j4 


CCI 


Clear the programmable clock interrupt request 


001405 


EC I 


Enable programmable clock interrupt request 


001406 


DCI 


Disable programmable clock interrupt request 


001407 



Instruction 0014 performs specialized functions for managing the 
real-time and programmable clocks and handles interprocessor interrupt 
requests and cluster number operations. Instruction 0014 is privileged 
to monitor mode and is treated as a pass instruction if the monitor mode 
bit is not set. 

When the k designator is 0, the instruction loads the contents of the 
Sj register into the RTC register. When the j designator is or 
(Sj)=0, the RTC register is cleared. 
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INSTRUCTION 0014 (continued) 

When the k designator is 1, the instruction sets the internal CPU 
interrupt request in the CPU associated with PN=J. If the CPU 
associated with PN=j is not in monitor mode, the Interrupt from 
Internal CPU (ICP) flag sets in the F register causing an interrupt. 
The request remains until cleared by the receiving CPU issuing 
instruction 001402. If the CPU associated with PN=j attempts to 
interrupt itself, the instruction becomes a no-op. 

When the k designator is 2, the instruction clears the internal CPU 
interrupt request set by any other CPU. 

When the k designator is 3, the instruction sets the cluster number to 
j to make the following cluster selections: 

CLN = No cluster; all shared register and semaphore operations 
are no-ops, (except SB, ST, or SM register reads, which 
return a value to Ai or Si) . 

CLN = 1 Cluster 1 

CLN = 2 Cluster 2 

CLN = 3 Cluster 3 

CLN = 4 Cluster 4 

CLN = 5 Cluster 5 

Clusters 1, 2, 3, 4 and 5 each have a separate set of SM, SB, and ST 
registers. 

When the k designator is 4, the instruction loads the low-order 32 
bits from the Sj register into both the II register and the ICD 
counter. When the j designator is or (Sj)=0, II and ICD are 
cleared. 

When the k designator is 5, the instruction clears the programmable 
clock interrupt request if the request is previously set by ICD counting 
down to 0. 

When the k designator is 6, the instruction enables repeated 
programmable clock interrupt requests at a repetition rate determined by 
the value stored in the II register. 

When the k designator is 7, the instruction disables repeated 
programmable clock interrupt requests until an instruction 001406 is 
executed to enable the requests. 
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INSTRUCTION 0014 (continued) 

HOLD ISSUE CONDITIONS: Sj reserved (except SO) 

For instruction 0014j3, hold issue 2+ CPs 

EXECUTION TIME: Instruction issue, 1 CP 

SPECIAL CASES: If the program is not in monitor mode, these 

instructions become no-ops but all hold issue 
conditions remain effective. 

For instructions 0014j0 and 0014j4, if j=0, 
(Sj)=0. 

For instruction 001 4j0, the value is entered 
into the RTC register 4 CPs after instruction 
0014j0 issues. 

For instruction 0014 jl, if the processor number 
equals j of the CPU issuing this instruction, 
the instruction becomes a no-op. (A CPU cannot 
interrupt itself if j equals the processor 
number of the CPU issuing this instruction.) 



If more than one CPU attempts to access semaphores or shared 
registers in the same clock period, a scanner will resolve the 
conflict. See shared register explanation in section 2. 
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INSTRUCTION 0015 



CAL Syntax 


Description 


Octal Code 


t 


Select performance monitor 


OOlSjO 


t 


Set maintenance read mode 


001501 


t 


Load diagnostic checkbyte with SI 


001511 


t 


Set maintenance write mode 1 


001521 


t 


Set maintenance write mode 2 


001531 



These instructions are all privileged to monitor mode. 

Instruction OOlSjO selects one of four groups of hardware related 

events to be monitored by the performance counters. See Appendix C for a 

description of how performance monitoring is accomplished. 

Instructions 001501 through 001531 are used to check the operation of the 
modules concerned with SECDED and to verify error detection and 
correction. The maintenance mode switch on the mainframe's control panel 
must be switched on during execution of these instructions or they become 
no-ops. See Appendix D for a description of SECDED maintenance mode 
functions. 

Instructions 001501 and 001521 are used to verify check bit memory 
storage. Instruction 001501 allows the 8 check bits for SECDED to 
replace certain data bit positions in any subsequent memory read for the 
CPU path (including fetch and I/O). Instruction 001521 allows certain 
write data bits to replace the 8 check bits for SECDED for any subsequent 
CPU write to memory. 

Instructions 001511 and 001531 are used to verify error detection and 
correction. Instruction 001511 loads a diagnostic check byte with the 
high order 8 bits of SI. Instruction 001531 enables a diagnostic check 
byte to replace the 8 check bits for SECDED being written into memory for 
any subsequent write to memory. 



t Not supported at this time 
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INSTRUCTION 0020 



CAL Syntax 


Description 


Octal Code 


VL tJi 
VL 1^ 


Transmit (Afe) to VL register 
Transmit 1 to VL register 


00200fe 
002000 



Instruction 00200^ enters the VL register with a value determined by 
the contents of hk. The low-order 6 bits of (A^) are entered into 
the VL register. The 7th bit of VL is set if the 6 low-order bits of 
{Afe)=0. 

For example, if (Afe)=0 or a multiple of 1003, then VL=1008. The 
content of VL is always between 1 and lOOg. 

Instruction 002000 transmits the value of 1 to the VL register. 



HOLD ISSUE CONDITIONS: a;;: reserved (except AO) 



EXECUTION TIME: 



Instruction issue, 1 CP 
VL register ready, 1 CP 



SPECIAL CASES: 



Maximum vector length is 64. 
iUi)^! if fe=0. 

(VL)=1008 if WQ and {Afe)=0 or a 
multiple of lOOg. 



f Special CAL syntax 
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INSTRUCTIONS 0021 - 0027 



CAL Syntax 


Description 


Octal Code 


EFI 


Enable interrupt on floating-point error 


002100 


DFI 


Disable interrupt on floating-point error 


002200 


ERI 


Enable interrupt on operand (address) 
range error 


002300 


DRI 


Disable interrupt on operand (address) 
range error 


002400 


DBM 


Disable bidirectional memory transfers 


002500 


EBM 


Enable bidirectional memory transfers 


002600 


CMR 


Complete memory references 


002700 



Instruction 002100 sets the Floating-point Mode flag in the M register. 
Instruction 002200 clears the Floating-point Mode flag in the M 
register. The two instructions do not check the previous state of the 
flag. When set, the Floating-point Mode flag enables interrupts on 
floating-point range errors as described in section 4. Issuing either of 
these instructions also clears the Floating-Point Error Status flag. 

Instruction 002300 sets the Operand Range Mode flag in the M register. 
Instruction 002400 clears the Operand Range Mode flag in the M register. 
The two instructions do not check the previous state of the flag. When 
set, the Operand Range Mode flag enables interrupts on operand (address) 
range errors as described in section 3. 

Instruction 002500 disables the bidirectional memory mode. Instruction 
002600 enables the bidirectional memory mode. Block reads and writes can 
operate concurrently in bidirectional memory mode. If the bidirectional 
memory mode is disabled, only block reads can operate concurrently. 

Instruction 002700 assures completion of all memory references within a 
particular CPU issuing the instruction. Instruction 002700 does not 
issue until all memory references before this instruction are at the 
stage of execution where completion occurs in a fixed amount of time. 
For example, a load of any data that has been stored by the CPU issuing 
instruction CMR, 002700 is assured of receiving the updated data if the 
load is issued after the CMR instruction. Synchronization of memory 
references between processors can be done by this instruction in 
conjunction with semaphore instructions. 
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INSTRUCTIONS 0021 - 0027 (continued) 

HOLD ISSUE CONDITIONS: Instructions 002500 and 002600, hold issue 2 CPs 

Instruction 002700, Ports A, B, C busy 

Instruction 002700, scalar memory reference 
active in clock period 1, 2, or 3 



EXECUTION TIME: 
SPECIAL CASES: 



Afe reserved (except AO) 

Instruction issue, 1 CP 

Instructions 002100 and 002200 are issued even 
if there are other floating-point operations in 
process resulting from previous issues. The 
interrupts are enabled or disabled at CP + 1; 
floating-point overflows occurring after that 
time cause interrupts if they are enabled even 
if the overflow is generated by a previously 
issued floating-point instruction. 

Instructions 002300 and 002400 are issued even 
if there are other memory references in process 
resulting from previous issues. The interrupts 
are enabled or disabled at CP + 1; operand range 
errors occurring after that time cause 
interrupts if they are enabled even if the 
operand range error is generated by a previous 
memory reference. 
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INSTRUCTIONS 0030, 0034, 0036, and 0037 



CAL Syntax Description Octal Code 



VM Sj Transmit (Sj) to VM register 0030j0 

VM 0^ Clear VM register 003000 

SMjfe 1,TS Test and set semaphore jk, <_ jk ^ SI^q 003 4^;;: 

SMjfe Clear semaphore jfe, _< jfe <_ 31^0 0036jfe 

SMj/c 1 Set semaphore jk, <_ jk <_ 31^0 0037j*fe 



Instruction 0030j0 enters the VM register with the contents of Sj. 
The VM register is cleared if the j designator is in instruction 
003000. These instructions are used in conjunction with the vector merge 
instructions (146 and 147) in which an operation is performed depending 
on the contents of VM. 

Instruction 0034jfe tests and sets the semaphore designated by ^, if 
the semaphore is set, issue is held until the other CPU clears that 
semaphore. If the semaphore is clear, the instruction issues and sets 
the semaphore. If all CPUs in a cluster are holding issue on a test and 
set, the DL flag is set in the Exchange Package (if not in monitor mode) 
and an exchange occurs. If an interrupt occurs while a test and set 
instruction is holding in the CIP register, the WS flag in the Exchange 
Package sets, CIP and NIP registers clear, and an exchange occurs with 
the P register pointing to the test and set instruction. The SM register 
is 32 bits with SMO being the most significant bit. 

Instruction 0036jfe clears the semaphore designated by jfe. 

Instruction 0037jfe sets the semaphore designated by jk, 

HOLD ISSUE CONDITIONS: For instruction 0030j0: 

Sg reserved (except SO) 

Instruction 003 in process, unit busy 1 CP 

Instruction 14ar in process, unit busy (VL) + 5 

CPs 

Instruction 175 in process, unit busy (VL) + 5 

CPs 



t Special CAL syntax 
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INSTRUCTIONS 0030, 0034, 0036, and 0037 (continued) 
HOLD ISSUE CONDITIONS: For instructions 0034jA:, 0036^, and 



(continued) 



EXECUTION TIME: 
SPECIAL CASES: 



0037jfe: 

Hold issue 1-i- CP^ 

For instruction 0034j7c: 

If current Cluster Number^O and SAJk is 
set, holds issue until other CPU in the same 
cluster clears the semaphore. 

Instruction issue, 1 CP 

(Sj)=0 if j=0. 

Instructions 0034jfe, 0036jfe, and 0037jk 
are no-ops if CliN=0. 



If more than one CPU attempts to access semaphores or shared 
registers in the same clock period, a scanner will resolve the 
conflict. See shared register explanation in section 2. 
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INSTRUCTION 004 



CAL Syntax Description Octal Code 



EX Normal exit 004000 



Instruction 004 causes an exchange sequence which voids the contents of 
the instruction buffers. If monitor mode is not in effect, the Normal 
Exit flag in the F register is set. All instructions issued before this 
instruction are run to completion; that is, when all results arrive at 
the operating registers because of previously issued instructions, an 
exchange sequence occurs to the Exchange Package designated by the 
contents of the XA register. The program address stored into the 
Exchange Package is advanced one count from the address of the normal 
exit instruction. Instruction 004 is used to issue a monitor request 
from a user program. 



HOLD ISSUE CONDITIONS: Any A, S, or V register reserved 

EXECUTION TIME: Instruction issue, 40 CPs; this time includes an 

exchange sequence (24 CPs) and a fetch operation 
(16 CPs) . 

SPECIAL CASES: None 
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INSTRUCTION 005 



CAL Syntax 



J B;jk 



Description 



Octal Code 



Branch to (BJk) 



OOSOjfe 



Instruction 005 sets the P register to the 24-bit parcel address 
specified by the contents of Bjfe causing execution to continue at that 
address. The instruction is used to return from a subroutine. 



HOLD ISSUE CONDITIONS 1 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction 034 or 035 in process 

Instruction 025 issued in the previous CP 

Second parcel in a different buffer, 2 CP delay 

Second parcel not in a buffer 

Instruction issue: 

Instruction parcel and following parcel both 
in a buffer and branch address in a buffer, 7 
CPs 

Instruction parcel and following parcel both 
in a buffer and branch address not in a 
buffer, 18 CPs. Additional time is needed if 
a memory conflict exists. The time to resolve 
a memory conflict depends on factors present. 

Instruction OOSOjfe executes as if it were a 
2-parcel instruction. Even though the parcel 
following the first parcel of instruction 
0050 jk is not used, it can cause a delay of 
instruction 0050 jfe if it is out of buffer. 
See execution times above. 



HR-0097 



5-21 



INSTRUCTION 006 



CAL Syntax 



Description 



Octal Code 



J exp 



Branch to ijkm 



OOeijkm 



The 2-parcel instruction 006 sets the P register to the parcel address 
specified by the low-order 24 bits of the ijkm field. Execution 
continues at that address. The high-order bit of the ijkm field is 
ignored. 



HOLD ISSUE CONDITIONS: Second parcel in different buffer, 2 CP delay 

Second parcel not in a buffer 



EXECUTION TIME: 



Instruction issue: 

Both parcels of instruction in the same buffer 
and branch address in a buffer, 5 CPs 

Both parcels of instruction in the same buffer 
and branch address not in a buffer, 16 CPs. 
Additional time is needed if a memory conflict 
exists. The time to resolve a memory conflict 
depends on factors present. 



SPECIAL CASES: 



None 
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INSTRUCTION 007 



CAL Syntax 



Description 



R exp Return jump to ijkm; set BOO to (P)+2. 



Octal Code 



OOHjkm 



The 2-parcel instruction 007 sets register BOO to the address of the 
parcel following the second parcel of the instruction. The P register is 
then set to the parcel address specified by the low-order 24 bits of the 
ijkm field. Execution continues at that address. The high-order bit 
of the ijkjn field is ignored. This instruction provides a return 
linkage for subroutine calls. The subroutine is entered through a return 
jump. The subroutine can return to the caller at the instruction 
following the call by executing a branch to the contents of the BOO 
register. 



HOLD ISSUE CONDITIONS: Instruction 034 or 035 in process 

Second parcel in a different buffer, 2 CP delay 
Second parcel not in a buffer 



EXECUTION TIME; 



SPECIAL CASES: 



Instruction issue: 

Both parcels of instruction in the same buffer 
and branch address in a buffer, 5 CPs 

Both parcels of instruction in the same buffer 
and branch address not in a buffer, 16 CPs. 
Additional time is needed if a memory conflict 
exists. The time to resolve a memory conflict 
depends on factors present. 

None 
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INSTRUCTIONS 010 - 013 



CAL Syntax 



Description 



JAZ exp Branch to ijkm if (A0)=0 (^2=0) 

JAN exp Branch to ijkm if (A0)?^0 (i2=0) 

JAP exp Branch to ijkm if (AO) positive, includes 
(A0)=0 (-£2=0) 

JAM exp Branch to ijkm if (AO) negative (•^2=0) 



Octal Code 

OlOijkm 
Ollijkm 
012ijkm 

Q13ijkm 



The 2-parcel instructions 010 through 013 test the contents of AO for the 
condition specified by the h field. If the condition is satisfied, the 
P register is set to the parcel address specified by the low-order 24 
bits of the ijkm field and execution continues at that address. The 
high-order bit of the ijkm field must be 0. If the condition is not 
satisfied, execution continues with the instruction following the branch 
instruction. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



AO busy in any one of the previous 3 CPs 

Second parcel in a different buffer, 2 CP delay 

Second parcel not in a buffer 

Instruction issue for branch taken: 

Both parcels of instruction in the same buffer, 
branch taken, and branch address in a buffer, 5 
CPs 

Both parcels of instruction in the same buffer, 
branch taken, and branch address not in a 
buffer; 16 CPs. Additional time is needed if a 
memory conflict exists. The time to resolve a 
memory conflict is indeterminate. 

Both parcels of instruction in different 
buffers, branch taken, and branch address in a 
buffer; 7 CPs. 

Both parcels of instruction in different 
buffers, branch taken, and branch address not 
in a buffer; 18 CPs. 
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INSTRUCTIONS 010 - 013 (continued) 

EXECUTION TIME: Second parcel of instruction not in a buffer, 

(continued) branch taken, and branch address in a buffer; 

18 CPs. 

Second parcel of instruction not in a buffer, 
branch taken, and branch address not in buffer; 
29 CPs. 

Instruction issue for branch not taken: 
Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in the 
same instruction buffer, 2 CPs 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in 
different instruction buffer, 4 CPs 

Both parcels of instruction in the same buffer 
and branch not taken with next instruction in 
memory; 16 CPs. 

Both parcels of instruction in different 
buffers and branch not taken; 4 CPs. 

Second parcel of instruction not in a buffer 
and branch not taken; 15 CPs. 



NOTE 



Whenever a fetch occurs, memory conflicts may produce a 
delay . 



SPECIAL CASES: (A0)=0 is considered a positive condition. 

L High-order bit of i designator {-12) must be 
0. 
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INSTRUCTIONS 014 - 017 



CAL Syntax 



Description 



JSZ exp Branch to ijkm if (S0)=0 (i2=0) 

JSN exp Branch to ijkm if {S0)?'0 {t2=0) 

JSP exp Branch to ijkm if (SO) positive, includes 
(S0)=0 (-£2=0) 

JSM exp Branch to ijkm if (SO) negative (^2=0) 



Octal Code 



014ijton 
OlSijton 
OIH jkm 

onijkm 



The 2-parcel instructions 014 through 017 test the contents of SO for the 
condition specified by the h field. If the condition is satisfied, the 
P register is set to the parcel address specified by the low-order 24 
bits of the ijhv field and execution continues at that address. The 
high-order bit of the ijkm field must be 0. If the condition is not 
satisfied, execution continues with the instruction following the branch 
instruction. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SO busy in any one of the previous 3 CPs 

Second parcel in a different buffer, 2 CP delay 

Second parcel not in a buffer 

Instruction issue for branch taken: 
Both parcels of instruction in the same buffer, 
branch taken, and branch address in a buffer, 5 
CPs 

Both parcels of instruction in the same buffer, 
branch taken, and branch address not in a 
buffer; 16 CPs. Additional time is needed if a 
memory conflict exists. The time to resolve a 
memory conflict is indeterminate. 

Both parcels of instruction in different 
buffers, branch taken, and branch address in a 
buffer; 7 CPs. 

Both parcels of instruction in different 
buffers, branch taken, and branch address not 
in a buffer; 18 CPs. 
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INSTRUCTIONS 014 - 017 (continued) 

EXECUTION TIME: Second parcel of instruction not in a buffer, 

(continued) branch taken, and branch address in a buffer; 

18 CPs. 

Second parcel of instruction not in a buffer, 
branch taken, and branch address not in buffer; 
29 CPs. 

Instruction issue for branch not taken: 
Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in the 
same instruction buffer, 2 CPs 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in 
different instruction buffer, 4 CPs 

Both parcels of instruction in the same buffer 
and branch not taken with next instruction in 
memory; 16 CPs. 

Both parcels of instruction in different 
buffers and branch not taken; 4 CPs. 

Second parcel of instruction not in a buffer 
and branch not taken; 15 CPs. 



NOTE 

Whenever a fetch occurs, memory conflicts may produce a 
delay . 

SPECIAL CASES: (S0)=0 is considered a positive condition. 

High-order bit of i designator (-£2) must be 0, 
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INSTRUCTION 01^ 



CAL Syntax 



Description 



Octal Code 



Ah exp Transmit ijkm to hh {i2=l) 



Olhijkm 



The 2-parcel instruction Olh enters a 24-bit value into fJi that is 
composed of the low-order 24 bits of the ijkm field. The high-order 
bit of the ijkm field must be set to distinguish the Olh instruction 
from the 010-017 branches. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



hh reserved 

Second parcel not in a buffer 

Second parcel in a different buffer 

Instruction issue; 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

hh ready, 1 CP 

High-order bit of i designator (12) must be 1. 
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INSOTRUCTIONS 020 - 021 



CAL Syntax Description Octal Code 



ki exp Transmit Jkm to hi OlOijkm 

Ai exp Transmit ones complement of jfem to Ai 021ijkm 



The 2-parcel instruction 020 enters a 24-bit value into At coii^>osed of 
the 22-bit Jkm field and 2 high-order bits of 0. 

The 2-parcel instruction 021 enters a 24-bit value that is the complement 
of a value formed by the 22-bit jkm field and 2 high-order bits of 
into Ai. The con^lement is formed by changing all 1 bits to and all 
bits to 1. Thus, for instruction 021, the high-order 2 bits of Ai 
are set to 1. The instruction provides a means of entering a negative 
value into Ai. However, if the instruction is used to enter a negative 
number, the positive number used in the jkm field must be one smaller 
than the absolute value of the expected final negative number. 



HOLD ISSUE CONDITIONS: Ai reserved 

Second parcel not in a buffer 

EXECUTION TIME; Instruction issue: 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

Ai ready, 1 CP 

SPECIAL C7USES: None 
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INSTRUCTION 022 



CAL Syntax 


Description 


Octal Code 


Ai exp 


Transmit Jk to Ai 


022^7: 



Instruction 022 enters the 6-bit quantity from the jk field into the 
low-order 6 bits of Ai. The high-order 18 bits of Ai are zeroed. No 
sign extension occurs. 



HOLD ISSUE CONDITIONS: Ai reserved 

EXECUTION TIME: Instruction issue, 1 CP 

Ai ready, 1 CP 
SPECIAL CASES: None 
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INSTRUCTION 023 



CAL Syntax 



Description 



Octal Code 



hi Sj Transmit (Sj) to Ai 
Ai VL Read vector length 



023ij0 
023i01 



Instruction 023-ijO enters the low-order 24 bits of (Sj) into Ai. The 
high-order bits of (Sj) are ignored. 

Instruction 023i01 enters the content of the VL register into hi. 



HOLD ISSUE CONDITIONS: At reserved 



EXECUTION TIME: 



SPECIAL CASES: 



For instruction 023ij0, Sj reserved 
(except SO) 

Instruction issue, 1 CP 

Ai ready, 1 CP 

(SJ>=0 if j=0. 

If (A1)=0, the sequence: 
VL AI 
A2 VL 
leaves (A2)=100g 

If (Al}s233, the sequence: 
VL AI 
A2 VL 
leaves {A2)=23q 

If (Al)=123g, the sequence: 
VL AI 
A2 VL 

leaves (A2)=238 

The 2^ bit in the VL is a 1 if the low-order 6 
bits are 0; otherwise, the 2^ bit is a 0. 
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INSTRUCTIONS 024 - 025 



CAL Syntax 



Description 



hi Bjfe Transmit (Bjfe) to Ai 
Bjk Ai Transmit (At) to Bjk 



Octal Code 

024i;jk 
025ijfe 



Instruction 024 enters the contents of Bjk into Ai. 
Instruction 025 enters the contents of Ai into B^k, 



HOLD ISSUE CONDITIONS: Instruction 034 or 035 in process 

For instruction 024ijkr instruction 025ij7c 
issued in previous CP 

Ai reserved 
EXECUTION TIME: For instruction 024, Ai ready, 1 CP 

Instruction issue, 1 CP 
SPECIAL CASES: None 
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INSTRUCTION 026 



CAL Syntax 



Description 



Octal CkJde 



hi PS J Population count of (Sj) to Ai 

Ai QSj Population count parity of (Sj) to Ai 

Ai SBj Transfer (SBj) to Ai 



026ij0 
026-tjl 
026ij7 



Instruction 026ij0 counts the number of bits set to 1 in (Sj) and 
enters the result into the low-order 7 bits of At. The high-order 17 
bits of Ai are zeroed. If (Sj)=:0, then (Ai)=0. 

Instruction 026ijl counts the number of bits set to 1 in (Sj) . Then, 
the low-order bit, showing the odd/even state of the result is 
transferred to the low-order bit position of the Ai register. The 
high-order 23 bits are cleared. The actual population count is not 
transferred. 

Instructions 026ij0 and 026ijl are executed in the Population/ 
Leading Zero Count functional unit. 

Instruction 026ij7 transfers the contents of the SBj register shared 
between the CPUs to Ai. 



HOLD ISSUE CONDITIONS: Ai reserved 

Sj reserved (except SO) 

For instruction 026ij7, hold issue 1 CP, then 
2+^" CP more after Ai not reserved. 
Minimum 3 CP hold. 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction issue, 1 CP 

For instructions 026ijO and 026ijl, Ai 
ready 4 CPs 

For instruction 026ij7, Ai ready 1 CP 

For instructions 026ij0 and 026ijl, (Ai)=0 if «/=0. 

For instruction 026ij7, (Ai)=0 if CLN^O. 



t If more than one CPU attempts to access semaphores or shared 
registers in the same clock period, a scanner will resolve the 
conflict. See shared register explanation in section 2. 
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INSTRUCTION 027 



CAL Syntax Description 




Octal Code 


At ZSj Leading zero count of (Sj) to A-t 
SBj Ai Transfer (Ai) to SBj 




027ij0 
027ij7 



Instruction 021ijQ counts the number of leading zeros in Sj and enters 
the result into the low-order 7 bits of hi. The high-order 17 bits of 
hi are zeroed. Instruction 027-tjO is executed in the Population/Leading 
Zero Count functional unit. 

Instruction 027tj7 stores (hi) to the SBj register, which is shared 
between the CPUs in the same cluster. 



HOLD ISSUE CONDITIONS) 



For instruction 027ij0, instruction 033 issued 
in CP 2 



EXECUTION TIME! 



SPECIAL CASES: 



Ai reserved 

Sj reserved (except SO) 

For instruction 027ij7, hold issue 1 CP, then 
2+' CP more after Ai not reserved. Minimum 
3 CP hold. 

Instruction issue, 1 CP 

For instruction 027ij0, hi ready, 3 CPs 

For instruction 027ij7, SBj ready 1 CP 

For instruction 027ij0, (Ai)=64 if ^0. 

For instruction 027ij0, (Ai)=0 if (Sj) is 
negative. 

Instruction 027ij7 is a no-op if CLN=0. 



t If more than one CPU attempts to access semaphores or shared 
registers in the same clock period, a scanner will resolve the 
conflict. See shared register explanation in section 2. 
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INSTRUCTIONS 030 - 031 



CAL 


Syntax 


Description 


Octal Code 


Ai 


Aj+Afe 


Integer sum of (Aj) and (Afe) to Ai 


030ijk 


Ai 


Afc*" 


Transmit (A/c) to Ai 


030i0fe 


Ai 


Aj+1^ 


Integer sum of (Aj) and 1 to Ai 


030ij0 


u 


Aj-Afe 


Integer difference (Aj) less (Afe) to Ai 


031ij"k 


hi 


-1^ 


Transmit -1 to Ai 


031i00 


Ai 


-Afe^ 


Transmit the negative of (Ak) to Ai 


031i0k 


Ai 


AJ-1^ 


Integer difference (Aj) less 1 to Ai 


031ij0 



Instruction 030 forms the integer sum of (Aj) and (Ak) and enters the 
result into Ai. No overflow is detected. 

Instruction 031 forms the integer difference of (Aj) and (A/c) and 
enters the result into Ai. No overflow is detected. 

Instructions 030 and 031 are executed in the Address Add functional unit. 



HOLD ISSUE CONDITIONS: Ai reserved 

Aj or Ak reserved (except AO) 
Instruction issue, 1 CP 
Ai ready, 2 CPs 



EXECUTION TIME: 



SPECIAL CASES: 



For instruction 030: 

(Ai) = (Ak) if ^0 and ^^^0. 

(Ai)«l if j=0 and ;^0. 

(Ai) = (Aj) + 1 if jVO and k=0. 

For instruction 031: 

(Ai)= -(Ak) if J=0 and k^O. 

(Ai)= -1 if j=0 and k=0. 

(Ai) = (Aj) - 1 if ^0 and fe=0. 



t Special CAL synteix 
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INSTRUCTION 032 



CAL Syntax 


Description 


Octal Code 


Ai Aj*Afe 


Integer product of (A^') and (Afe) to hi 


032ijfc 



Instruction 032 forms the integer product of (Aj) and (Afe) and enters 
the low-order 24 bits of the result into Ai. No overflow is detected. 

Instruction 032 is executed in the Address Multiply functional unit. 



HOLD ISSUE CONDITIONS: Ai reserved 

Aj or Afe reserved (except AO) 
Instruction issue, 1 CP 
Ai ready, 4 CPs 



EXECUTION TIME: 



SPECIAL CASES: 



(Ai)=0 if j=0. 
(AA:)=1 if fe=0. 
Thus, (Ai) = (Aj) if ;ifO and k'-O. 
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INSTRUCTION 033 



CAL 


Syntax 


Description 


Octal Code 


At 


CI 


Channel number of highest priority interrupt 
request to Ai 


033i00 


Ai 


CA,Aj 


Current address of channel (Aj) to Ai 


033ij0 


hi 


CE,Aj 


Error flag of channel (Aj) to Ai 


033ijl 



Instruction 033 enters channel status information into Ai. The j and 
k designators and the contents of Aj define the desired information. 

The channel number of the highest priority interrupt request is entered 
into Ai when the j designator is 0. The contents of Aj specify a 
channel number when the J designator is nonzero. The value of the 
Current Address (CA) register for the channel is entered into Ai when 
the k designator is 0. The error flag for the channel is entered into 
the low-order bit of Ai when the k designator is 1. The high-order 
bits of Ai are cleared. The error flag can be cleared only in monitor 
mode using instruction 0012. 

Instruction 033 does not interfere with channel operation and is not 
protected from user execution. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SE^CIAL CASES: 



Ai reserved 

Aj reserved (except AO) 

Instruction issue, 1 CP 

Ai ready, 4 CPs 

(Ai)=Highest priority channel causing interrupt 
if (Aj*)=0. 

(Ai) ^Current address of channel (Aj) if 
(aJ)^0 and fe=0. 

[hi) -1/0 error flag of channel (Aj) if 
(hj)^0 and k^l. 
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INSTRUCTION 033 (continued) 



SPECIAL CASES: 
(continued) 



6 CPs must elapse after instruction 0012j0 issues 
before issuing instruction 033i00. 



Before the results of a 033-tjO instruction to 
channels 10-17 or a 033ijl instruction to channels 
6 or 7 are valid, there is a 12 CP latency. 
Therefore, before a 033ijX instruction can be 
issued to these channels 12 CPs must elapse after 
issuing a channel function or completing a channel 
transfer. 



If instruction 033 issues every 10 CPs (in a loop), 
the same results will always be returned to A(i) . 

When k=l and (Aj)=6.or 7: 

Bits 2^ through 2^^ contain the remaining 
block length. 

Bit 2^^ indicates a request in progress. 

Bit 2^^ will return a 0. 

Bit 2^° indicates a block length error. 

Bit 2^1 indicates either an SSD double-bit 
memory error (during a read SSD operation) or an 
SSD double-bit channel error (during a write SSD 
operation) . 

22 
Bit 2*"' indicates a CPU double-bit memory error. 



>23 



20 



Bit 2'^'* indicates a fatal error (if bit 2^", 
2^1, or 2^2 is set) . 
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INSTRUCTIONS 034 - 037 



CAL Syntax Description Octal Code 



BjfefAi ,A0 Block transfer (hi) words from memory starting 03Aijk 
at address (AO) to B registers starting at 
register jTc 

Bjfe,Ai 0,A0 Block transfer (Ai) words from memory starting 034ijk 
at address (AO) to B registers starting at 
register Jk 

,A0 BjkfAi Block transfer (Ai) words from B registers 035ijk 
starting at register Jk to memory starting 
at address (AO) 

0,A0 BJk,Ai Block transfer (Ai) words from B registers 035ijk 
starting at register Jk to memory starting 
at address (AO) 

Tjk,Ai ,A0 Block transfer (Ai) words from memory starting 036ijk 
at address (AO) to T registers starting at 
register jk 

•Sjkffii 0,A0 Block transfer (Ai) words from memory starting 036ijk 
at address (AO) to T registers starting at 
register jk 

,A0 rjkiAi Block transfer (Ai) words from T registers Q31ijk 
starting at register Jk to memory starting 
at address (AO) 

0,A0 iJklAi Block transfer (Ai) words from T registers 037ijk 
starting at register jTc to memory starting 
at address (AO) 



Instructions 034 through 037 perform block transfers between memory and B 
or T registers. 

In all the instructions, the amount of data transferred is specified by 
the low-order 7 bits of (Ai) . See special cases for details. 

The first register involved in the transfer is specified by jk. 
Successive transfers involve successive B or T registers until B77 or T77 
is reached. Since processing of the registers is circular, BOO is 
processed after B77 and TOO is processed after T77 if the count in (Ai) 
is not exhausted. 



t Special CAL syntax 
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INSTRUCTIONS 034 - 037 (continued) 

The first memory location referenced by the transfer instruction is 
specified by (AO) . The AG register contents are not altered by execution 
of the instruction. Memory references are incremented by 1 for 
successive transfers. 

For transfers of B registers to memory, each 24-bit value is right 
adjusted in the word, high-order 40 bits are zeroed. When transferring 
from memory to B registers, only low-order 24 bits are transmitted; 
high-order 40 bits are ignored. 



HOLD ISSUE CONDITIONS: AO reserved 

Ai reserved 
Scalar reference in CPl, CP2, or CP3 

For instruction 034, Port A busy or instruction 

035 in process or uni-directional memory mode and 
Port C busy 

For instruction 035, Port C busy or instruction 
034 in process or uni-directional memory mode and 
Port A or Port B busy 

For instruction 036, Port B busy or instruction 
037 in process or uni-directional memory mode and 
Port C busy 

For instruction 037, Port C busy or instruction 

036 in process or uni-directional memory mode and 
Port A or Port B busy 

EXECUTION TIME: Instruction issue, 1 CP 

For instruction 034 or 036: 

B or T register reserved 16 CPs + (Ai) if 
(Ai);*0; 6 CPs if (Ai)=0. 
Port A or B busy for (Ai) + 6 CPs if 
{Ai)jiO; 4 CPs if {At)=0. 

For instruction 035 or 037: 

B or T register reserved 5 CPs + (Ai) if 
(Ai)jfO; 4 CPs if (Ai)=0. 
Port C busy for (At) + 6 CPs if (Ai)?'0; 
4 CPs if (Ai)=0. 
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INSTRUCTIONS 034 - 037 (continued) 

SPECIAL CASES: (hi)-0 causes a zero-block transfer. 

(hi) in the range greater than IOO3 and less 
than 20O3 causes a wrap-around condition. 

If (Ai) is greater than IVTgr bits 2^ 

through 2^^ are truncated. The block length is 

equal to the value of 2*^ through 2^. 



NOTE 

Instruction 034 uses Port A, instruction 035 uses Port 
C, instruction 036 uses Port B, and instruction 037 
uses Port C. 
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INSTRUCTIONS 040 - 041 



CAL Syntax Description Octal Code 



Si exp Transmit j'km to Si 040ijkm 

Si exp Transmit complement of jhn to Si Oilijhn 



The 2-parcel instructions 040 and 041 enter immediate values into an S 
register. 

Instruction 040 enters a 64-bit value composed of the 22-bit jkm field 
and 42 high-order bits of into Si. 

Instruction 041 enters a 64-bit value that is the coir^lement of a value 
formed by the 22-bit jkm field and 42 high-order bits of into Si. 
The complement is formed by changing all 1 bits to and all bits 
to 1. Thus, for instruction 041, the high-order 42 bits of Si are set 
to I's. The instruction provides for entering a negative value into 
si. Since the register value is the ones complement of Jkm, to get 
the tvwas conqplement Jkm should be to get -1, 1 to get -2, 3 to get 
-4, etc. 



HOLD ISSUE CONDITIONS: Si reserved 

Second parcel not in a buffer 

EXECUTION TIME: Instruction issue: 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

Si ready, 1 CP 

SPECIAL CASES: None 
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INSTRUCTIONS 042 - 043 



CAL 


Syntax 


Description 


Octal Code 


Si 


<exp 


Form exp bits of ones mask in Si from right; 
jk field gets 64-exp. 


042ijfe 


Si 


it>exp^ 


Form exp bits of zeros mask in Si from left; 
jk field gets exp. 


042ijk 


Si 


1^ 


Enter 1 into Si 


042i77 


Si 


-1^ 


Enter -1 into Si 


042i00 


Si 


>exp 


Form exp bits of ones mask in Si from left; 
Jk field gets exp. 


043ijk 


Si 


*<exp'' 


Form exp bits of zeros mask in Si from right; 
jk field gets 6i-exp. 


043ijk 


si 


0^ 


Clear Si 


043i00 



Instruction 042 generates a mask of 64-j7c ones from right to left in 
si. For example, if j7:=0, Si contains all 1 bits (integer value- -1) 
and if jk=nQ, Si contains zeros in all but the low-order bit 
(integer value=l) . 

Instruction 043 generates a mask of ^k ones from left to right in Si. 
For example, if jk-O, Si contains all bits (integer value=0) and if 
3k=n^, Si contains ones in all but the low-order bit (integer value= -2) 

Instructions 042 and 043 are executed in the Scalar Logical functional 
unit. 



HOLD ISSUE CONDITIONS: 
EXECUTION TIME: 

SPECIAL CASES: 



Si reserved 

Instruction issue, 1 CP 
si ready, 1 CP 
None 



t Special CAL syntax 
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INSTRUCTIONS 044 - 051 



CAL Syntax 


Description 


Octal Code 


Si Sj&Sfe 


Logical product of (Sj) and (S/() to Si 


044ijk 


St sj&sb''' 


Sign bit of (Sj) to Si 


044ij0 


si SB&Sj'' 


Sign bit of (Sj) to Si (jVO) 


044ij0 


Si %sk&sj 


Logical product of (Sj) and complement of 
(Sk) to si 


045ijfe 


Si #SB&S/ 


(Sj) with sign bit cleared to Si 


045ij0 


si sj\sfe 


Logical difference of (Sj) and (Sk) to Si 


046ij/c 


si Sj\SB^ 


Toggle sign bit of (Sj) / then enter into Si 


046ij0 


si sb\s/ 


Toggle sign bit of (Sj) , then enter into Si 
(d¥0) 


046ij0 


Si *sd\sk 


Logical equivalence of (Sk) and (Sj) to Si 


047ijfe 


si *Sk^ 


Transmit ones complement of (Sk) to Si 


047i0fe 


si #Sj\SB^ 


Logical equivalence of (Sj) and sign bit 
to si 


047ij0 


si #sb\sj'^ 


Logical equivalence of (Sj) and sign bit to 
Si (Jf^O) 


047ij0 


si #sb'' 


Enter ones complement of sign bit into Si 


047i00 


si sjlsi&sfe 


Scalar merge 


OSOijfe 


si SjlSi&SB'^ 


Scalar merge of (Si) and sign bit of (Sj) 
to si 


OSOijO 


si sjisfc 


Logical sum of (Sj) and (Sk) to Si 


OSlijk 


si skf 


Transmit (Sk) to Si 


OSliOfe 


si SjlSB^ 


Logical sum of (Sj) and sign bit to Si 


OSlijO 


si SBiSj'^ 


Logical sum of (Sj) and sign bit to Si (jVO) 


OSlijO 


si sb'^ 


Enter sign bit into Si 


OSliOO 



t Special CAL syntax 
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INSTRUCTIONS 044 - 051 (continued) 



NOTE 



For instructions 044 through 051, SB with no register 
designator is the sign bit, not Shared Address register. 



Instructions 044 through 051 are executed in the Scalar Logical 
functional unit. 

Instruction 044 forms the logical product (AND) of (Sj) and (Sfe) and 
enters the result into Si. Bits of si are set to 1 when 
corresponding bits of (Sj) and (Sfe) are 1 as in the following example: 

(Sj) =110 
iSk) = 10 10 
(Si) =10 

(Sj) is transmitted to Si if the j and k designators have the 
same nonzero value. Si is cleared if the j designator is 0. The 
sign bit of (Sj) is transmitted to Si if the j designator is 
nonzero and the k designator is 0. 

Instruction 045 forms the logical product (AND) of (Sj) and the 
complement of (Sk) and enters the result into Si. Bits of Si are 
set to 1 when corresponding bits of (Sj) and the con«>lement of (Sk) 
are 1 as in the following example where (Sk") = complement of (Sk) : 

if (Sk) =10 10 

(Sj) =110 

(Sk') = 10 1 

(Si) =0100 

si is cleared if the j and k designators have the same value or if 
the J designator is 0. (Sj) with the sign bit cleared is transmitted 
to Si if the J designator is nonzero and the k designator is 0. 

Instruction 046 forms the logical difference (exclusive OR) of (Sj) and 
(Sk) and enters the result into Si. Bits of Si are set to 1 when 
corresponding bits of (Sj) and (Sk) are different as in the following 
example : 

(Sj) =110 
(Sk) = 10 10 
(Si) =0110 
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INSTRUCTIONS 044 - 051 (continued) 

si is cleared if the 3 and fe designators have the same nonzero 
value. (Sfe) is transmitted to Si if the 3 designator is and the 
k designator is nonzero. The sign bit of (Sj) is complemented and 
the result is transmitted to Si if the 3 designator is nonzero and 
the ^ designator is 0. 

Instruction 047 forms the logical equivalence of (Sj) and (Sfe) and 
enters the result into Si. Bits of Si are set to 1 when corresponding 
bits of (Sj) and (Sfe) are the same as in the following example: 

(Sj) =110 
(Sfe) = 10 10 
(Si) =10 1 

si is set to all ones if the j and k designators have the same 
nonzero value. The complement of (Sfe) is transmitted to Si if the 
J designator is and the k designator is nonzero. All bits except 
the sign bit of (Sj) are ccHnplemented and the result is transmitted to 
si if the 3 designator is nonzero and the k designator is 0. The 
result is the complement produced by instruction 046. 

Instruction 050 merges the contents of (Sj) with (Si) depending on 
the ones mask in Sk. The result is defined by the following Boolean 
equation where S/c' is the complement of Sfe as illustrated: 

(Si) = (Sj) (S/C) + (Si) (Sfe') 

if (Sfe) =11110000 

(Sfe') =00001111 

(Si) =11001100 

(Sj) = 10101010 

(Si) =10101100 

Instruction 050 is intended for merging portions of 64-bit words into a 
composite word. Bits of Si are cleared when the corresponding bits of 
Sfe are 1 if the 3 designator is and the k designator is nonzero. 
The sign bit of (Sj) replaces the sign bit of Si if the 3 designator 
is nonzero and the k designator is 0. The sign bit of Si is cleared if 
the 3 and k designators are both 0. 

Instruction 051 forms the logical sum (inclusive OR) of (Sj) and (Sfe) 
and enters the result into Si. Bits of Si are set when 1 of the 
corresponding bits of (Sj) and (Sfe) is set as in the following 
example : 

(Sj) =110 
(Sfe) = 10 10 
(Si) =1110 
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INSTRUCTIONS 044 - 051 (continued) 

(Sj) is transmitted to St if the 3 and fe designators have the 
same nonzero value. (S^) is transmitted to Si if the 3 designator 
is and the k designator is nonzero. (Sj) with the sign bit set to 
1 is transmitted to S-i if the j designator is nonzero and the fe 
designator is 0. A ones mask consisting of only the sign bit is entered 
into si if the 3 and k designators are both 0. 

HOLD ISSUE CONDITIONS: Si reserved 

Sj or Sfe reserved (except SO) 
EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 1 CP 

SPECIAL CASES: (Sj)=0 if J=0. 

(Sfc)=263 if ;^o. 
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INSTRUCTIONS 052 - 055 



CAL Syntax 






Description 


Octal Code 


SO Si<exp 


Shift 


(Si) 


left exp^jk places to SO 


Q52ijk 


SO si>exp 


Shift 


(Si) 


right ex^SA-jk places to SO 


053ijk 


Si Si<exp 


Shift 


(Si) 


left exp=3k places to Si 


054ijfe 


si si>exp 


Shift 


(Si) 


right exp=6A-jk places to Si 


055ijfe 



Instructions 052 through 055 are executed in the Scalar Shift functional 
unit. They shift values in an S register by an amount specified by 
jTc. All shifts are end off with zero fill. 

Instruction 052 shifts (Si) left jk places and enters the result into 
SO. Shift range is through 63 left. 

Instruction 053 shifts (Si) right by 64-jfe places and enters the 
result into SO. Shift range is 1 through 64 right. 

Instruction 054 shifts (Si) left jfe places and enters the result into 
si. Shift range is through 63 left. 

Instruction 055 shifts (Si) right by 64- jk places and enters the 
result into Si. Shift range is 1 through 64 right. 



HOLD ISSUE CONDITIONS I 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction 056, 057, 060, or 061 issued in 
previous CP 

si reserved 

For instructions 052 and 053, SO reserved 

Instruction issue, 1 CP 

For instructions 052 and 053, SO ready, 2 CPs 

For instructions 054 and 055, Si ready, 2 CPs 

None 
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INSTRUCTIONS 056 - 057 



CAL Syntax 






Description Octal Code 


Si Si,S3<Rk 


Shift 


(Si) 


and (Sj) left by (Afe) places to Si 


056ijk 


Si si,Sj<l^ 


Shift 


(Si) 


and (Sj) left one place to Si 


056ij0 


Si si<Pik^ 


Shift 


(Si) 


left (Afe) places to Si 


056i0;ic 


Si Sj,Si>hk 


Shift 


(Sj) 


and (Si) right by (Afe) places to Si 


057 ijk 


si S3,si>i^ 


Shift 


(Sj) 


and (Si) right one place to Si 


057ij0 


si Si>Afe^ 


Shift 


(Si) 


right (Afe) places to Si 


057i0fe 



Instructions 056 and 057 are executed in the Scalar Shift functional 
unit. They shift 128-bit values formed by logically joining two S 
registers. Shift counts are obtained from register Afe. All shift 
counts, (A/c) / are considered positive and all 24 bits of {Ak) are 
used for the shift count. A shift of one place occurs if the k 
designator is 0. If j=0, the shifts function as if the shifted value 
were 64 bits rather than 128 bits since the Sj value used is 0. 

The shifts are circular if the shift count does not exceed 64 and the i 
and J designators are equal and nonzero. For instructions 056 and 057, 
(Sj) is unchanged, provided i^j. For shifts greater than 64, the 
shift is end off with zero fill. If i-j and the shift is greater 
than 64, the shift is the same as if the respective instruction 054 or 
055 was used with a shift count 64 less. 

Instruction 056 performs left shifts of (Si) and (Sj) with (Si) 
initially the most significant bits of the double register. The 
high-order 64 bits of the result are transmitted to Si. Si is 
cleared if the shift count exceeds 127. Instruction 056 produces the 
same result as instruction 054 if the shift count does not exceed 63 and 
the j designator is 0. 

Instruction 057 performs right shifts of (Sj) and (Si) with ISj) 
initially the most significant bits of the double register. The 
low-order 64 bits of the result are transmitted to Si. Si is cleared 
if the shift count exceeds 127. Instruction 057 produces the same result 
as instruction 055 if the shift count does not exceed 63 and the j 
designator is 0. 



t Special CAL syntax 
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INSTRUCTIONS 056 - 057 (continued) 
HOLD ISSUE CONDITIONS: Si reserved 

Sj or Afe reserved (except SO and/or AO) 
EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 3 CPs 
SPECIAL CASES: (Sj)=0 if J=0. 

(Afe)=l if k=0. 

Circular shift if i»=^0 and aA: greater 

than or equal to and less than or equal to 64. 
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INSTRUCTIONS 060 - 061 



CAL Syntax 



Description 



Octal Code 



Si Sj+Sfe Integer sum of (Sj) and (Sfe) to Si 060ijk 

si S^Sfe Integer difference of (Sj) and (Sfe) to Si 061ijk 

si -Sk^ Transmit negative of (Sk) to Si 061i0/c 



Instruction 060 forms the integer sums of (Sj) and (SJ^) and enters 
the result into St. No overflow is detected. 

Instruction 061 forms the integer difference of (Sj) and (Sk) and 
enters the result into Si. No overflow is detected. 

Instructions 060 and 061 are executed in the Scalar Add functional unit. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Si reserved 

Sj or Sk reserved (except SO) 

Si ready, 3 CPs 

Instruction issue, 1 CP 

(Si) =2^3 if j=0 and krO. 

For instruction 060: 

(Si) = (S/£) if ^0 and Jg^O. 

(Si) = (Sj) with 2^3 con^lemented if 

jj^O and /c=0. 

For instruction 061: 

(Si)= -(Sfc) if j=0 and kj^O. 

(Si) = (Sj) with 2^-^ complemented if 

jYO and k=0. 



t Special CAL syntax 
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INSTRUCTIONS 062 - 063 



CAL 


Syntax 


Description 


Octal Code 


Si 


SJ+FS/C 


Floating-point sum of (Sj) and (Sk) to Si 


062ijk 


Si 


+FSk^ 


Normalize (S/c) to Si 


062i0k 


Si 


Sj-FSk 


Floating-point difference of (Sj) and (Sk) 
to Si 


Q63ijk 


Si 


-FSfe'" 


Transmit normalized negative of (Sk) to Si 


063i0k 



Instructions 062 and 063 are performed in the Floating-point Add 
functional unit. Operands are assumed to be in floating-point format. 
The result is normalized even if the operands are not normalized. 

Instruction 062 forms the sum of the floating-point quantities in Sj 
and S/c and enters the normalized result into Si. 

Instruction 063 forms the difference of the floating-point quantities in 
Sj and S/c and enters the normalized result into Si. 

Overflow conditions are described in section 4. For floating-point 
operands with the sign bit set {bit=l) , zero exponent and zero 
coefficient are treated as (that is, all 64 bitssO).'' 



HOLD ISSUE CONDITIONS: 



Si reserved 

Sj or Sk reserved (except SO) 

Instructions 170 through 173 in process, unit 
busy (VL) + 4 CPs 



EXECUTION TIME: 



Instruction issue, 1 CP 
Si ready, 6 CPs 



t Special CAL syntax 

tt Considered -0. No floating-point unit generates a -0 except the 
Floating-point Multiply functional unit if one of the operands was a 
-0. Normally, -0 occurs in logical manipulations when a sign is 
attached to a number; that number can be 0. 
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INSTRUCTIONS 062 - 063 (continued) 



SPECIAL CASES: 



For instruction 062: 

(Si)-lSk) normalized if (Sk) exponent is 
valid, j=0 and k^Q. 

(Si)=(Sj) normalized if (Sj) exponent is 
valid, jyo and fe=0. 



For instruction 063: 

(Si)- -(Sfe) normalized if (Sfe) exponent is 
valid, j=0 and k^O. Sign of (St) is 
opposite that of (Sfe) if (Sk)¥0. 
(Si)=(Sj) normalized if (Sj) exponent is 
valid, jyo and fe=0. 
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INSTRUCTIONS 064 - 067 



CAL 


Syntax 


Description 


Octal Code 


Si 


Sj*FSk 


Floating-point product of (Sj) and (Sfe) to Si 


06Hjk 


Si 


Sj*HSfe 


Half-precision rounded floating-point 
product of (S J) and (Sk) to Si 


oesijk 


si 


Sj*RSfe 


Rounded floating-point product of (Sj) and 
(.Sk) to Si 


066ijk 


si 


Sj*ISfe 


Reciprocal iteration; 2-(Sj)*(Sfe) to Si 


067 i^'k 



Instructions 064 through 067 are executed in the Floating-point Multiply 
functional unit. Operands are assumed to be in floating-point format. 
The result is not guaranteed to be normalized if the operands are not 
normalized. 

Instruction 064 forms the product of the floating-point quantities in 
Sj and sk and enters the result into Si. 

Instruction 065 forms the half-precision rounded product of the 
floating-point quantities in Sj and Sk and enters the result into 
si. The low-order 19 bits of the result are cleared. 

Instruction 066 forms the rounded product of the floating-point 
quantities in Sj and Sk and enters the result into Si. 

Instruction 067 forms two minus the product of the floating-point 
quantities in Sj and Sk and enters the result into Si. This instruction 
is used in the divide sequence as described in section 4 under 
Floating-point Arithmetic. 

In the evaluation C = 2-B*A, B must be a reciprocal of A of less than 47 
significant bits and not the exact reciprocal; otherwise, C will be in 
error. The reciprocal produced by the reciprocal approximation 
instruction meets this criterion. 



HOLD ISSUE CONDITIONS: Si reserved 

Sj or S^ reserved (except SO) 

Instructions 160 through 167 in process, unit 
busy (VL) + 4 CPs 

Instructions 140 through 145 in process. Second 
Vector Logical unit busy (VL) + 4 CPs 
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EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTIONS 064 - 067 (continued) 
Instruction issue r 1 CP 
Si ready, 7 CPs 
(Sj)=0 if j=0. 
(SW=263 if Jc=o, 

If both exponent fields are 0, an integer 
multiply is performed. Correct integer multiply 
results are produced if the following conditions 
are met: 



• Both operand sign bits are 0. 

• The sum of the bits to the right of the 
least significant 1 bit in the two operands 
is greater than or equal to 48. 

The integer result obtained is the high-order 48 
bits of the 96-bit product of the two operands. 
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INSTRUCTION 070 



CAL Syntax 



Description 



Octal Code 



Si /HSj Floating-point reciprocal approximation of 070ij0 
(Sj) to Si 



Instruction 070 is executed in the Reciprocal Approximation functional 
unit. 

Instruction 070 forms an approximation to the reciprocal of the 
normalized floating-point quantity in Sj and enters the result into 
si. This instruction occurs in the divide sequence to compute the 
quotient of two floating-point quantities as described in section 4 under 
Floating-point Arithmetic. 

The reciprocal approximation instruction produces a result of 30 
significant bits. The low-order 18 bits are zeros. The number of 
significant bits can be extended to 48 using the reciprocal iteration 
instruction and a multiply. 



HOLD ISSUE CONDITIONS: Si reserved 

Sj reserved (except SO) 

Instruction 174 in process, unit busy (VL) + 4 CPs 

Si ready, 14 CPs 

Instruction issue, 1 CP 

(Si) is meaningless if (Sj) is not 
normalized; the unit assumes that bit 2*^ of 
(Sj)=l; no test is made of this bit. 

(Sj)=0 produces a range error; the result is 
meaningless. 



EXECUTION TIME: 



SPECIAL CASES: 



(Sj)=0 if j=0. 
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INSTRUCTION 071 



CAL 


Syntax 


Description 


Octal Ck>de 


Si 


Afe 


Transmit (A^) to Si with no sign extension 


071i0fe 


Si 


+hk 


Transmit (Afe) to Si with sign extension 


071ilfe 


Si 


+FAfe 


Transmit (A/c) to Si as unnormalized 
floating-point number 


071i2fe 


Si 


0.6 


Transmit constant 0.75 x 2^^ to Si 


071i30 


Si 


0.4 


Transmit constant 0.5 to Si 


071i40 


Si 


1. 


Transmit constant 1.0 to Si 


071i50 


Si 


2. 


Transmit constant 2.0 to Si 


071i60 


Si 


4. 


Transmit constant 4.0 to Si 


071i70 



Instruction 071 performs functions that depend on the value of the j 
designator. The functions are concerned with transmitting information 
from an A register to an S register and with generating frequently used 
floating-point constants. 

When the j designator is 0, the 24-bit value in A/c is transmitted to 
Si. The value is treated as an unsigned integer. The high-order bits 
of si are zeros. 

When the J designator is 1, the 24-bit value in Ak is transmitted to 
si. The value is treated as a signed integer. The sign bit of a;^ is 
extended through the high-order bit of Si. 

When the J designator is 2, the 24-bit value in Ak is transmitted to 
Si as an unnormalized floating-point quantity (the result is then added 
to to normalize) . For this instruction, the exponent in bits 
2^2 through 2*^ is set to 40O6O3. The sign of the coefficient is 
set according to the sign of A^. If the sign bit of Afe is set, the 
twos complement of Ak is entered into Si as the magnitude of the 
coefficient and bit 2^^ of Si is set for the sign of the coefficient. 

A sequence of instructions is used to convert an integer whose absolute 
value is less than 24 bits to floating-point format: 



CAL code: 


Al 


SI 




SI 


+FA1 




SI 


+FS1 



9 CPs required 
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INSTRUCTION 071 (continued) 

When the j designator is 3, the floating-point constant of 0.75 x 2*^ 
is entered into Si (0 40060 6000 0000 0000 OOOO3} . This constant is 
used to create floating-point numbers from integer numbers (positive and 
negative) whose absolute value is less than 47 bits. A sequence of 
instructions is used for conversion of an integer in SI: 



CAL code: 



S2 0.6 
SI S2-S1 
SI S2-FS1 



11 CPs required 



When the j designator is 4, the floating-point constant 0.5 
(= 40000 4000 0000 0000 OOOO3} is entered into St. 

When the j designator is 5, the floating-point constant 1.0 
(- 40001 4000 0000 0000 OOOOg) is entered into Si. 

When the j designator is 6, the floating-point constant 2.0 
(s 40002 4000 0000 0000 OOOOg) is entered into Si. 

When the j designator is 7, the floating-point constant 4.0 
(= 40003 4000 0000 0000 OOOOg) is entered into Si. 



HOLD ISSUE CONDITIONS: Si reserved 



EXECUTION TIME: 



SPECIAL CASES: 



hk reserved (except AO) ; applies to all forms 
of the instruction, that is, j designators 
through 7. 

Instruction issue, 1 CP 

Si ready, 2 CPs 

{Afe)=l if fe=0. 

(Si) = (Afe) if j=0. 

(Si)=(Afe) sign extended if j=l. 

(Si)=(A^) unnormalized if j=2. 

(Si)=0.6 X 2^0 (octal) if j=3. 

(Si) =0.4 X 2° (octal) if ^4. 

(Si) =0.4 X 2I (octal) if j^5. 

(Si) =0.4 X 2^ (octal) if j=6. 

(Si) =0.4 X 2^ (octal) if 3=!. 
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INSTRUCTIONS 072 - 075 



CAL Syntax 


Description 


Octal Code 


Si RT 


Transmit (RTC) to Si 


072i00 


Si SH 


Read semaphores to Si 


072i02 


Si STj 


Read (STj) register to Si 


072ij3 


Si VM 


Transmit (VM) to Si 


073i00 


t 


Read performance counter into Si 


073ill 


t 


Increment performance counter 


073i21 


t 


Clear all maintenance modes 


073i31 


Si SRj 


Transmit (SRj) to Si; j^O 


073ijl 


SM Si 


Load semaphores from Si 


073i02 


STj Si 


Load (STj) register from Si 


073ij3 


Si Tjfe 


Transmit (Tj'k) to Si 


074ijfe 


Tjfe Si 


Transmit (Si) to Tjk 


OlSijk 



Instruction 072i00 enters the 64-bit value of the real-time clock (RTC) 
into si. The clock is incremented by 1 each CP. The RTC can be set 
only by the monitor through use of instruction 0014j0. 

Instruction 072i02 enters the values of all of the semaphores into 
si. The 32-bit SM register is left justified in Si with SMOO 
occupying the sign bit. 

Instruction 072ij3 enters the contents of STj into Si. 

Instruction 073i00 enters the 64-bit value of the VM register into 

si. The VM register is usually read after being set by instruction 175. 

Instruction 073ill is used for performance monitoring and is privileged 
to monitor mode. Kach execution of the 073ill instruction advances a 
pointer and enters either the high-order or low-order bits of a 
performance counter into the high-order bits of Si. See Appendix C for 
information on performance monitoring. 



t Not supported at this time 
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INSTRUCTIONS 072 - 075 (continued) 

Instructions 073t21 and 073i31 are part of the SBCDED maintenance 
mode functions and are executed only if the maintenance mode switch on 
the mainframe's control panel is on. Instruction 073i21 enables 
certain data bits to replace the 8 check bits used for SECDED as they are 
written into memory for any subsequent write to memory (except for I/O 
write to memory) . Instruction 073i31 clears all three SECDED 
maintenance mode instructions: 001501, 001521, and 001531. See Appendix 
D for complete information on the SECDED maintenance modes. 

Instruction 073'£jl enters the contents of the Status register SBj into 
si. Instruction 073i01 returns the following status to the high-order 
bits of si: 

si Bit Description 

263 Clustered, CLN # (CL) 

2^ Program state (PS) 

2^^ Floating-point error occurred (FPS) 

2^^ Floating-point interrupt enabled (IFF) 

2^' Operand range interrupt enabled (lOR) 

2^^ Bidirectional memory enabled (BDM) 

2^^^ Processor number bit 1 (PNl) 

240t Processor number bit (PNO) 

2^4^ Cluster number bit 2 (CLN2) 

233t Cluster number bit 1 (CLNl) 

2^2^ Cluster number bit (CLNO) 

Instruction 073-t02 sets the semaphores from 32 high-order bits of 
si. SMOO receives the sign bit of Si. 

Instruction 073ij3 enters the contents of Si into STj. 

Instruction 074 enters the contents of Tjfe into Si. 

Instruction 075 enters the contents of Si into Tjfe. 

HOLD ISSUE CONDITIONS: Si reserved 

For instructions 074 and 075, instructions 036 
through 037 in process 

For instruction 074, instruction 075 issued in 
the previous CP 



These bit positions return a value of zero if not executed in monitor 
mode. 
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INSTRUCTIONS 072 - 075 (continued) 
HOLD ISSUE CONDITIONS: For instruction 073i00: 



(continued) 



Instruction 14a: or 175 in process, VM busy 

for (VL) + 5 CPs 

Instruction 003 in process, VM busy for 1 CP 



For instructions 072tj3, 073ij3, and 
073i02, hold issue 1 CP, then 2+^ CP more 
after Si not reserved. Minimum 3 CP hold. 



EXECUTION TIME: 



Instruction issue, 1 CP 



All cases except 073ij3, result register ready, 
1 CP 

For 073t02, SM ready, 1 CP 



SPECIAL CASES: 



For instructions 072t02 and 072ij3, (St) =0 
if CLN=0. 



Instructions 073i02 and 073-ij3 are no-ops if 
CLN=0. 



t It more than one CPU attempts to access semaphores or shared 
registers in the same clock period, a scanner will resolve the 
conflict. See shared register explanation in section 2. 
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INSTRUCTIONS 076 - 077 



CAL Syntax 



Description 



Octal Code 



Si Vjfkk Transmit (Vj element {Ak) ) to Bi 
Vi,Ak Sj Transmit (S j) to Vi element (Ak) 
Vi,Ak 0^ Clear Vi element (Ak) 



076ijk 
077 ijk 
077i0k 



Instructions 076 and 077 transmit a 64-bit quantity between a V register 
element and an S register. 

Instruction 076 transmits the contents of an element of register Vj to 
si. 

Instruction 077 transmits the contents of register Sj to an element of 
register vi. 

The low-order 6 bits of (Ak) determine the vector element for either 
instruction. 



HOLD ISSUE CONDITIONS: Ak reserved (except AO) 



EXECUTION TIME! 



SPECIAL CASES: 



For instruction 076, Si reserved or Vj 
reserved as operand or as result 

For instruction 077, Vi reserved as operand or 
as result or Sj reserved 

Instruction issue, 1 CP 

For instruction 076, Si ready, 4 CPs 

For instruction 077, vi ready, 1 CP 

(Sj)=0 if j=0. 

(Ak)=l if k=0. 



t Special CAL syntax 
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INSTRUCTIONS lOh - 13h 



CAL Syntax 



Description 



hi exp,hh 
hi exp,0^ 
hi exp, ^ 
hi fRh"^ 
exp,hh hi 
exp, hi"^ 
exp, hi' 
,hh hi^ 
si exp,hh 
si exp,0^ 
si exp, '^ 
Si ,PJl^ 
exp,hh si 
exp,0 si^ 
exp, sV 
,hh si^ 



Read from ((hh) + jfem) to hi 
Read from (jfcm) to hi 
Read from (jfem) to hi 
Read from {hh) to hi 
Store (hi) to (AW + 3km 
Store {hi) to jTon 
Store {hi) to exp 
Store {hi) to {hh) 
Read from ( (A/i) + 3km) to Si 
Read from {exp) to Si 
Read from {exp) to Si 
Read from {hh) to Si 
Store (Si) to (AW + jfew 
Store (Si) to exp 
Store (Si) to exp 
Store (Si) to {hh) 



Octal Code 

lOhijkm 
lOOiJhn 
IQOijkm 
loHoo 

llhijkm 

llOiJkm 

llOijkm 

llhiOO 

12hi3km 

120ijfeOT 

120ijfeOT 

12HOO 

l^hijkm 

130ijkm 

iSOijkm 

13H0O 



The 2-parcel instructions lOh through 13^1 transmit data between 
memory and an A register or an S register. 

If the enhanced addressing mode bit in the Exchange Package is not set, 
the content of a;i (treated as a 22-bit signed integer) is added to the 
signed 22-bit integer in the jkm field to determine the memory 
address. Data base address bits 2^2 and 2^^ will determine which 4 
million words of memory will be used. If the enhanced addressing mode 
bit (EAM) of the Exchange Package is set, the content of hh (treated as 
a 24-bit integer) is added to the sign extended 24-bit integer in the 
Jkm field to determine the memory address. 



t Special CAL syntax 
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INSTRUCTIONS lOh - 13h (continued) 

If h is 0, (A/i) is and only the jfew field is used for the address. 
The address arithmetic is performed by an address adder similar to but 
separate from the Address Add functional unit. 

Instructions lOh and llh transmit 24-bit quantities to or from A 
registers. When transmitting data from memory to an A register, the 
high-order 40 bits of the memory word are ignored. Oh a store from Ai 
into memory, the high-order 40 bits of the memory word are zeroed. 

Instructions 12h and 13h transmit 64-bit quantities to or from 
register Si. 



HOLD ISSUE CONDITIONS: Port A, B, or C busy 

Aft reserved or busy previous CP 

For instructions lOh and llh, hi reserved 

For instructions 12h and 13ft, Si reserved 

Instructions 10a: through 13x in CP 2 and CP 
3 and conflict 

Second parcel not in a buffer 

Second parcel in different buffer, 2 CP 

Instruction issue: 

Both parcels in same buffer, 2 CPs 

For instruction 10ft, hi ready, 14 CPs 

For instruction 12ft, St ready, 14 CPs 

Bank ready for next scalar read or store, 4 CPs 



EXECUTION TIMEi 



NOTE 

After issuing instructions 10ft through 13ft, 
atten^ting to issue instructions 034 through 037, 
176, or 177 causes Ports A, B, or C to be 
considered busy for 4 CPs (plus additional CPs if 
there are conflicts) . 



SPECIAL CASES: 



If the enhanced addressing mode bit (EAM) of the 
Exchange Package is set, the jhn field is 
sign-extended to 24 bits. 
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INSTRUCTIONS 140 - 147 



CAL 


Syntax 


Description 


Octal Code 


Vt 


Sj&Vfe 


logical products of (Sj) and (Vk elements) 
to Vi elements 


liOijk 


vi 


vj&Vk 


Logical products of (Vj elements) and 
(Vk elements) to Vi elements 


14Hjk 


Vi 


sjivk 


Logical sums of (Sj) and (Vk elements) 
to Vi elements 


li2ijk 


Vi 


vk^ 


Transmit (Vk elements) to Vi elements 


lA2iOk 


Vi 


vjivk 


Logical sums of (Vj elements) and 
(Vk elements) to Vi elements 


143ijk 


Vi 


sjXvfe 


Tiogical differences of (Sj) and (Vfc elements) 
to Vi elements 


144ijk 


Vi 


Vj\Vfe 


Logical differences of (Vj elements) and 
(V^ elements) to Vi elements 


145ijfe 


Vi 


0^ 


Clear Vi elements 


lABiii 


vi 


SjiVk&VM 


If VM bit=l, transmit (Sj) to the corres- 
ponding element in Vi 

If VM bit=0, transmit the (corresponding 
Vk element) to the (corresponding Vi element) 


146ijk 


vi 


♦VM&Vfe^ 


If VM bit=l, transmit (0) to the corresponding 

element in Vi 

If VM bit-O, transmit the (corresponding 

Vk element) to the (corresponding Vi element) 


146i0k 


vi 


VjIVfe&VM If VM bit=l, transmit the (corresponding Vj 
element) to the (corresponding Vi element) 
If VH bit=0, transmit the (corresponding V^ 
element) to the (corresponding Vi element) 


lAlijk 



Instructions 140 through 145 can be executed in either the Full Vector 
Logical or the Second Vector Logical functional units, provided the 
Second Vector Logical Unit is enabled. If the Second Vector Logical unit 
is disabled, instructions 140 through 145 can be executed only in the 
Full Vector Logical unit. Instructions 146 and 147 execute in the 



t Special CAL syntax 
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INSTRUCTIONS 140 - 147 (continued) 

Pull Vector Logical unit only. The number of operations performed is 
determined by the contents of the VL register. All operations start with 
element of the Vi, Vj, or Vfe register and increment the element number 
by 1 for each operation performed. All results are delivered to Vi. 

For instructions 140, 142, 144, and 146, a copy of the content of Sj' is 
delivered to the functional unit. The copy of the content is held as one 
of the operands until completion of the operation. Therefore, Sj can 
be changed immediately without affecting the vector operation. For 
instructions 141, 143, 145, and 147, all operands are obtained from V 
registers. 

Instructions 140 and 141 form the logical products (AND) of operand pairs 
and enter the result into Vi. Bits of an element of Vi are set to 1 
when the corresponding bits of (Sj) or (Vj element) and (Vfe element) 
are 1 as in the following: 

(Sj) or (Vj element) =110 
(Vk element) = 10 10 
(Vi element) =10 

Instructions 142 and 143 form the logical sums (inclusive OR) of operand 
pairs and deliver the results to Vi. Bits of an element of vi are set 
to 1 when one of the corresponding bits of (Sj) or (Vj element) and 
(Vfe element) is 1 as in the following: 

(Sj) or (Vj element) =110 
(Vk element) = 10 10 
(Vi element) =1110 

Instructions 144 and 145 form the logical differences (exclusive OR) of 
operand pairs and deliver the results of Vi. Bits of an element are set 
to 1 when the corresponding bit of (Sj) or (VJ element) is different 
from (Vk element) as in the following: 

(Sj) or (Vj element) =110 
(Vk element) = 10 10 
(Vi element) =0110 

Instructions 146 and 147 transmit operands to Vi depending on the 
contents of the VM register. Bit 2^^ of the mask corresponds to element 
of a V register. Bit 2^ corresponds to element 63. Operand pairs 
used for the selection depend on the instruction. For instruction 14 6, 
the first operand is always (Sj) , the second operand is (Vk element) . 
For instruction 147, the first operand is (Vj element) and the second 
operand is (Vk element) . If bit n of the vector mask is 1, the first 
operand is transmitted? if bit n of the mask is 0, the second operand, 
(Vfe element) , is selected. 
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INSTRUCTIONS 140 - 147 (continued) 
Examples : 

1. If instruction 146 is to be executed and the following register 
conditions exist: 

(VL) = 4 

(VM) » 60000 0000 0000 0000 0000 

(S2) = -1 

(V600) = 1 

(V601) = 2 

(V602) = 3 

(V603) = 4 

Instruction 146726 is executed. Following execution, the first four 
elements of V7 contain the following values: 

(V700) = 1 

(V701) = -1 

(V702) = -1 

(V703) = 4 

The remaining elements of V7 are unaltered. 

2. If instruction 147 is to be executed and the following register 
conditions exist: 

(VL) =4 

(VM) = 600000 0000 0000 0000 0000 

(V200) = 1 (V300) = -1 

(V201) = 2 (V301) = -2 

(V202) = 3 (V302) = -3 

(V203) = 4 (V303) = -4 

Instruction 147123 is executed. Following execution, the first four 
elements of VI contain the following values: 

(VIOO) = -1 

(VlOl) = 2 

(V102) = 3 

(V103) = -4 

The remaining elements of VI are unaltered. 

HOLD ISSUE CONDITIONS: V^ reserved as operand 

Vi reserved as operand or result 

For instructions 140, 142, 144, and 146, Sj 
reserved 
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INSTRUCTIONS 140 - 147 (continued) 



HOLD ISSUE CONDITIONS; 
(continued) 



EXECUTION TIME: 



SPECIAL CASES: 



For instructions 141, 143, 145, and 147, Vj 
reserved as operand 

For instructions 146 and 147, or instructions 140 
through 145 with Second Vector Logical unit 
disabled: 

Instruction 14x or 175 in process. Full 
Vector Logical unit busy (VL) + 4 CPs 

For instructions 140 through 145 with Second 
Vector Logical unit enabled: 

See discussion of Second Vector Logical issue 

in section 4. 

Instruction 140 through 145 or 16a; in progress 
in Second Vector Logical/Floating-point Multiply 
unit. Second Vector Logical unit busy (VL) + 4 CPs 

Instruction 140 through 147 or 175 in progress in 
Full Vector Logical unit. Full Vector Logical unit 
busy (VL) + 4 CPs 

Instruction issue, 1 CP 

Vj or Vfe ready in (VL) + 3 CPs if data 
available^ 

Vi ready in (VL) + 7 CPs if data available''' 
for the Full Vector Logical unit; 9 CPs if 
available for the Second Vector Logical unit. 

Unit ready, (VL) + 4 CPs if data available'^ 
(Sj)=0 if j=0. 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 150 - 151 



CAL 


Syntax 


Description 


Octal Code 


Vi 


Vj<hk 


Shift (Vj) elements left by (Afe) places 
to Vi elements 


isoij;^ 


Vi 


Vj<l^ 


Shift (Vj) elements left one place to 
vi elements 


ISOijTO 


vi 


Vj>Afe 


Shift (Vj) elements right by (Afe) places 
to vi elements 


I5iijk 


vi 


Vj>l^ 


Shift (Vj) elements right one place to 
Vi elements 


ISlijO 



Instructions 150 and 151 are executed in the Vector Shift functional 
unit. The number of operations performed is determined by the contents 
of the VL register. Operations start with element of the Vi and Vj 
registers and end with elements specified by (VL)-l. 

All shifts are end off with zero fill. The shift count is obtained from 
(Ak) and all 24 bits of Rk are used for the shift count. Elements of 
vi are cleared if the shift count exceeds 63. All shift counts (AjSc) 
are considered positive. 

Unlike shift instructions 052 through 055, these instructions receive the 
shift count from A/Cr rather than the jk fields. 



HOLD ISSUE CONDITIONS: Vj reserved as operand 

vi reserved as operand or result 
Ak reserved (except AO) 



Instructions 150 through 153 in process, unit 
busy (VL) + 4 CPs^'" 



t Special CAL syntax 

tt Vector instructions may or may not start execution immediately; 
they execute as data becomes available. In particular, a memory 
conflict that slows execution of some elements of a vector load can 
cause delays in all instructions in the operation chain, starting with 
that load. 
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INSTRUCTIONS 150 - 151 (continued) 
EXECUTION TIME: Vj ready in (VL) + 3 CPs if data available'' 

Vi ready in (VL) + 8 CPs if data available'^ 
Unit ready, (VL) + 4 CPs if data available'' 
SPECIAL CASES: (Ak)=l if fc=0. 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 152 - 153 



CAL Syntax 


Description 




Octal Code 


vi Vj,Vj<Rk 


Double shifts of (Vj elements) 
places to vi elements 


left (Afe) 


152ijfe 


Vi Vj,Vj<l^ 


Double shifts of (Vj elements) 
place to vi elements 


left one 


152ij0 


Vi Vj,Vj>hk 


Double shifts of (Vj elements) 
places to vi elements 


right (Ak) 


153ijfe 


vi vj,vj>i'' 


Double shifts of (Vj elements) 
place to vi elements 


right one 


153ij0 



Instructions 152 and 153 are executed in the Vector Shift functional 
unit. The instructions shift 128-bit values formed by logically joining 
the contents of two elements of the Wj register. The direction of the 
shift determines whether the high-order bits or the low-order bits of the 
result are sent to Vi. Shift counts are obtained from register Afe. 

All shifts are end off with zero fill. 

The number of operations is determined by the contents of the VL register. 

Instruction 152 performs left shifts. The operation starts with element 
of Vj. If (VL) is 1, element is joined with 64 bits of 0, and the 
resulting 128-bit quantity is then shifted left by the amount specified 
by (Afe) . Only the one operation is performed. The 64 high-order bits 
remaining are transmitted to element of Vi. 

If (VL) is 2, the operation starts with element of Vj being joined 
with element 1, and the resulting 128-bit quantity is then shifted left 
by the amount specified by (Afe) . The high-order 64 bits remaining are 
transmitted to element of Vi. Figure 5-7 illustrates this operation. 

If (VL) is greater than 2, the operation continues by joining element 1 
with element 2 and transmitting the 64-bit result to element 1 of Vi. 
Figure 5-8 illustrates this operation. 

If (VL) is 2, element 1 is joined with 64 bits of and only two 
operations are performed. In general > the last element of Vj as 
determined by (VL) is joined with 64 bits of zeros. Figure 5-9 
illustrates this operation. 



t Special CAL syntax 
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INSTRUCTIONS 152 - 153 (continued) 
263 20 2^3 



(element 0) of Vj | (element 1) of V^ 



,63- 



263- (Afe) 2° 2^3 



264- (Afe) 2O. 



(element 0) of V/ 



(el«aeiit i) of Vj 



(M) 



>63 



2O 



64-bit result to element of Vi 



Figure 5-7. Vector left double shift, first element, VL greater than 1 



263 



2O 263 



(element 1) of Vj 



(element 2) of Vj 



263-- 263-(Afe) 2° 263" 2S4-{^) 2 


^ - 


(element 1) of V# Cel«»ient ai of Vj 







(Ak) 



.63 



2O 



64-bit result to element 1 of Vi 



Figure 5-8. Vector left double shift, second element, VL greater than 2 



263 


2O 


263 






2 


(element (VL)-l^) of Vj 




000 


.... 











263- 2'^3-(Ak) 


2O 263-- -^ 2^A-(Ak) 


2 


^ -• 


(element (VL)-l*) of V^' 


■■■^■■■■■lll 









(Ak) 



263 



2O 



64-bit result to element (VL)-l'^ of Vj 



Figure 5-9. Vector left double shift, last element 



t Elements are numbered through 63 in the V registers; therefore, 
element (VL)-l refers to the VL^" element. 
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INSTRUCTIONS 152 - 153 (continued) 

If (Afe) is greater than or equal to 128, the result is all zeros. If 
(Afe) is greater than 64, the result register contains at least (Afc) - 64 
zeros. 



Examples : 

1. If instruction 152 is to be executed and the following register 
conditions exist: 



(VL) 

(Al) 

(V400) 

(V401) 

(V402) 

(V403) 



4 
3 



1 
1 



00000 0000 0000 0000 0007 

60000 0000 0000 0000 0005 

00000 0000 0000 0000 0006 

60000 0000 0000 0000 0007 



Instruction 152541 is executed. Following execution, the first four 
elements of V5 contain the following values: 

(V500) - 00000 0000 0000 0000 0073 

(V501} = 00000 0000 0000 0000 0054 

(V502) = 00000 0000 0000 0000 0067 

(V503) = 00000 0000 0000 0000 0070 

Instruction 153 performs right shifts. The original element of 
Vj is joined with 64 high-order bits of and the 128-bit quantity 
is shifted right by the amount specified by (Afe) . The 64 
low-order bits of the result are transmitted to element of Vi. 
Figure 5-10 illustrates this operation. 



263 






20 


263 




20 




000 


• • • • u 




(element 0) of Vj 


k 








\ 







\ 



\ 



V _63 



N 

2(Ak)V2«2" 



(Afe)- 



000. 



.0 



\ 

2^^> \; 



i^Ti^mimt m 'Oft Vj 



>63 



64-bit result to 
element of Vt 



Figure 5-10, Vector right double shift, first element 



If (VL)=1, only one operation is performed. In general, however, 
instruction execution continues by joining element with element 1, 
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INSTRUCTIONS 152 - 153 (continued) 

shifting the 128-bit quantity by the amount specified by (Afe) , and 
transmitting the result to element 1 of Vt. This operation is 
shown in figure 5-11. 



2" 




2° 


2" 




2° 


k — 


(element 0) of Vj 




"v 


(element 1) of Vj 





\ 



N. 



N 



N 



-^63 2<Afe,-1^.20 


263 ^(PJ<) 


"2° 




(element 0) of ¥/ 


C^X^rtt I) of V3 





>63 



64-(Afe) bits 



64-bit result to 
element 1 of vi 



Figure 5-11. Vector right double shift, second element, 
VL greater than 1 



The last operation performed by the instruction joins the last 
element of Vj as determined by (VL) with the preceding element. 
Figure 5-12 illustrates this operation. 



>63 



2O 263 



element (VL)-2) of Vj 


(element (VL)-l^) of Vj 

1 : J 



^63 



,(aA:)-i- 



(Afe)- 



^2\ 2'^ 



.(Afe) 



(element (VL)-2) of VJ 



-2' 



tele»eat (VL)-!^) of Vj 



263 



20 



64-bit result to 
element (VL)-l of Vj 

Figure 5-12. Vector right double shift, last operation 

2. If an instruction 153 is to be executed and the following register 
conditions exist: 



t Elements are numbered through 63 in the V registers; therefore, 
element (VL)-l refers to the VL*^*^ element. 
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INSTRUCTIONS 152 - 153 (continued) 

(VL) = 4 

{A6) = 3 

(V200} - 00000 0000 0000 0000 0017 

(V201) s 60000 0000 0000 0000 0006 

(V202) = 1 00000 0000 OOOO 0000 0006 

(V203) == 1 60000 0000 0000 0000 0007 

Instruction 153026 is executed and following execution, register VO 
contains the following values: 

(VOOO) = 00000 0000 0000 0000 0001 

(VOOl) = 1 66000 0000 0000 0000 0000 

(V002) = 1 50000 0000 0000 0000 0000 

(V003) = 1 56000 0000 0000 0000 0000 



The remaining elements of VO are unaltered. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Vj reserved as operand 

vi reserved as operand or result 

hk reserved (except AO) 

Instructions 150 through 153 in process, unit 
busy (VL) + 4 CPs'" 

Instruction issue, 1 CP 

Vj ready in (VL) + 3 CPs if data available'^ 

For instruction 152, Vi ready in (VL) + 9 CPs 
if data available^ 

Instruction 153, Vi. ready in (VL) + 8 CPs if 
data available* 

Unit ready, (VL) + 4 CPs if data available'^ 

(Afe)=l if fe=0. 



t Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 154 - 157 



CAL 


Syntax 


Description 


Octal Code 


Vi 


Sj+Vfe 


Integer sums of (Sj) and (Vfe elements) to 
vi elements 


154-tjfe 


vi 


Vj+Vfe 


Integer sums of (Vj elements) and 
(V^ elements) to vi elements 


155ijfe 


Vi 


sj-vk 


Integer differences of (Sj) and (Vk elements) 
to vi elements 


iseijk 


Vi 


-Vfe^ 


Transmit negative of (Vk elements) to Vi 
elements 


156i0k 


Vi 


vj-vk 


Integer differences of (Vj elements) and 
(Vk elements) to Vi elements 


157ijk 



Instructions 154 through 157 are executed in the Vector Add functional 
unit. 

Instructions 154 and 155 perform integer addition. Instructions 156 and 
157 perform integer subtraction. The number of additions or subtractions 
performed is determined by the contents of the VL register. All 
operations start with element of the V registers and increment the 
element number by 1 for each operation performed. All results are 
delivered to elements of vi. No overflow is detected. 

Instructions 154 and 156 deliver a copy of (Sj) to the functional unit 
where the copy is retained as one of the operands until the vector 
operation completes. The other operand is an element of Vk. For 
instructions 155 and 157, both operands are obtained from V registers. 



HOLD ISSUE CONDITIONS: Vk reserved as operand 

vi reserved as operand or result 

Instructions 154 through 157 in process, unit 
busy (VL) + 4 CPs'^ 

For instructions 154 and 156, Sj reserved 
(except SO) 

For instructions 155 and 157, Vj reserved as 
operand 



t Special CAL syntax 
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EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTIONS 154 - 157 (continued) 

Instruction issue, 1 CP 

Vj or Vfe ready in (VL) + 3 CPs if data 
available'^ 

Vi ready in (VL) + 8 CPs if data available'' 

Unit ready, (VL) + 4 CPs if data available'^ 

For instruction 154, if j=0, then (Sj)=0 and 
(Vi element) = (Vfe element) . 

For instruction 156, if j=0, then (Sj)=0 and 
(Vi element) = -(Vfe element). 



t Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 160 - 167 



CAL Syntax 



Description 



Octal Code 



Vi Sj*FVfe Floating-point products of (Sj) and 
(V^ elements) to Vt elements 

Vi Vj*FV/c Floating-point products of (Vj elements) 
and {Vk elements) to vi elements 



16UJk 



vi Sj*HVfe Half -precision rounded floating-point products 162ijk 
of (Sj) and (Vfe elements) to Vi elements 

Vi Vj*HVfe Half-precision rounded floating-point products 163ijfe 
of (Vj elements) and (Vk elements) to 
vi elements 

vi Sj*RVk Rounded floating-point products of (Sj) and 164ij7c 
(Vk elements) to Vi elements 

vi Vj*RVk Rounded floating-point products of 165ijk 

(Vj elements) and (Vk elements) to Vi elements 

vi Sj*ivk Reciprocal iterations; 2-(Sj)*(vk elements) 166ijk 
to vi elements 

vi Vj*IVk Reciprocal iterations; 2-(Vj elements)* 167ijk 
(Vk elements) to Vi elements 



Instructions 160 through 167 are executed in the Floating-point Multiply 
functional unit. The number of operations performed by an instruction is 
determined by the contents of the VL register. All operations start with 
element of the V registers and increment the element number by 1 for 
each successive operation. 

Operands are assumed to be in floating-point format. Instructions 160, 
162, 164, and 166 deliver a copy of (Sj) to the functional unit where 
the copy is retained as one of the operands until the completion of the 
operation. Therefore, Sj can be changed immediately without affecting 
the vector operation. The other operand is an element of Vk. For 
instructions 161, 163, 165, and 167, both operands are obtained from V 
registers. 

All results are delivered to elements of Vi. If either operand is not 
normalized, there is no guarantee that the products will be normalized. 
If neither operand is normalized, the product will not be normalized. 

Out-of-range conditions are described in section 4. 
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INSTRUCTIONS 160 - 167 (continued) 

Instruction 160 forms the products of the floating-point quantity in 
Sj and the floating-point quantities in elements of Vk and enters 
the results into Vi. 

Instruction 161 forms the products of the floating-point quantities in 
elements of Vj and Vk and enters the results into Vi, 

Instruction 162 forms the half-precision rounded products of the 
floating-point quantity in Sj and the floating-point quantities in 
elements of Vfe and enters the results into Vi. The low-order 19 
bits of the result elements are zeroed. 

Instruction 163 forms the half-precision rounded products of the 
floating-point quantities in elements of Vj and Vk and enters the 
results into vi. The low-order 19 bits of the result elements are 
zeroed. 

Instruction 164 forms the rounded products of the floating-point 
quantity in Sj and the floating-point quantities in elements of Vk 
and enters the results into Vt. 

Instruction 165 forms the rounded products of the floating-point 
quantities in elements of Vj and V^ and enters the results into Vi. 

Instruction 166 forms for each element, two minus the product of the 
floating-point quantity in Sj and the floating-point quantity in 
elements of Vk^ It then enters the results into Vi. See the 
description of instruction 067 for more details. 

Instruction 167 forms for each element pair, two minus the product of 
the floating-point quantities in elements of Vq and Vk and enters 
the results into vi. See the description of instruction 067 for more 
details. 



HOLD ISSUE CONDITIONS: Vfe reserved as operand 

vi. reserved as operand or result 

Instruction 16x in process, unit busy 
(VL) + 4 CPs^ 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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HOLD CONDITIONS: 
(continued) 



EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTIONS 160 - 167 (continued) 

Instructions 140-145 in process in Second Vector 
Logical unit. Unit busy (VL) + 4 CPs 

For instructions 160, 162, 164, and 166, Sj 
reserved (except SO) 

For instructions 161, 163, 165, and 167, Vj 
reserved as operand 

Instruction issue, 1 CP 

Vj and Vfe ready in (VL) + 3 CPs if data 
available'' 

Vi ready in (VL) + 12 CPs if data 
available'^ 

Unit ready, (VL) + 4 CPs if data available''' 

(Sj)=0 if j=0. 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 170 - 173 



CAL Syntax 



Description 



Octal Code 



Vi Sj+FVfe Floating-point sums of (Sj) and (Vk elements) llOijk 
to Vi element 

Vi +PVk^ Transmit normalized (V/c elements) to Vi 170i0k 

elements 

vi Vj+FVk Floating-point sums of (Vj elements) and niijk 
{Vk elements) to Vi elements 

Vi Sj-FVk Floating-point differences of (Sj) and 172ijk 
(Vk elements) to Vi elements 

vi -FVk^ Transmit normalized negatives of 172i0k 

(Vk elements) to vi elements 

Vi Vj-FVk Floating-point differences of (Vj elements) 173ijk 
and (Vk elements) to Vi elements 



Instructions 170 through 173 are executed in the Floating-point Add 
functional unit. Instructions 170 and 171 perform floating-point 
addition; instructions 172 and 173 perform floating-point subtraction. 
The number of additions or subtractions performed by an instruction is 
determined by contents of the VL register. All operations start with 
element of the V registers and increment the element number by 1 for 
each operation performed. All results are delivered to Vi normalized 
and results are normalized even if the operands are not normalized. 

Instructions 170 and 172 deliver a copy of (Sj) to the functional unit 
where it remains as one of the operands until the completion of the 
operation. The other operand is an element of Vk. For instructions 
171 and 173, both operands are obtained from V registers. Out-of-range 
conditions are described in section 4. 



HOLD ISSUE CONDITIONS: Vk reserved as operand 

vi reserved as operand or result 



t Special CAL syntax 
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INSTRUCTIONS 170 - 173 (continued) 

HOLD ISSUE CONDITIONS: Instructions 170 through 173 in process, unit 
(continued) busy (VL) + 4 CPs'" 

For instructions 170 and 172, Sj reserved 
(except SO) 

For instructions 171 and 173, Vj reserved as 
operand 

EXECUTION TIME: Instruction issue, 1 CP 

Vj and Vfe ready in (VL) + 3 CPs if data 
available''' 

Vi ready in (VL) + 11 CPs if data available'" 

Unit ready, (VL) + 4 CPs if data available^ 

SPECIAL CASES: (Sj)=0 if J=0. 



t Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTION 174 



CAL Syntax 



Description 



Octal Code 



Vi /HVj Floating-point reciprocal approximation of 
(Vj elements) to Vi elements 



174ij0 



Instruction 174 is executed in the Reciprocal Approximation functional 
unit. The instruction forms an approximate value of the reciprocal of 
the normalized floating-point quantity in each element of Vj and enters 
the result into elements of Vi, The number of elements for which 
approximations are found is determined by the contents of the VL register. 

Instruction 174 occurs in the divide sequence to compute the quotients of 
floating-point quantities as described in section 4 under floating-point 
arithmetic. 

The reciprocal approximation instruction produces results of 30 
significant bits. The low-order 18 bits are zeros. The number of 
significant bits can be extended to 48 using the reciprocal iteration 
instruction and a multiply. 



EXECUTION TIME: 



HOLD ISSUE CONDITIONS: Vi reserved as operand or result 

Vj reserved as operand 



Instruction 174 in process, unit busy for 
(VL) + 4 ops'" 

Instruction issue, 1 CP 

Vj ready in (VL) + 3 CPs if data available^ 

Vi ready in (VL) + 19 CPs if data 
available'^ 

Unit ready, (VL) + 4 CPs if data available^ 

(Vi element) is meaningless if (Vj element) 
is not normalized; the unit assumes that bit 
2^^ of (Vj element) is 1; no test of this 
bit is made. 



SPECIAL CASES: 



t Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 174-tjl - 174ij2 



CAL Syntax Description Octal Code 



V-t PVj Population count of (Vj elements) to Vi 174ijl 
elements 

vi QVj Population count parity of (Vj elements) to 174ij2 
vi elements 



Instructions 174tjl and 174ij2 are executed in the Vector 
Population/Parity functional unit, sharing some logic with the Reciprocal 
Approximation functional unit. 

Instruction 174ijl counts the number of bits set to 1 in each element 
of Vj and enters the results into corresponding elements of Vi. The 
results are entered into the low-order 7 bits of each V-t element; the 
remaining high-order bits of each vi element are zeroed. 

Instruction 174ij2 counts the number of bits set to 1 in each element 
of Vj. The least significant bit of each element result shows whether 
the result is an odd or even number. Only the least significant bit of 
each element is transferred to the least significant bit position of the 
corresponding element of register Vi. The remainder of the element is 
set to zeros. The actual population count results are not transferred. 



HOLD ISSUE CONDITIONS: Vi reserved as operand or result 

Vj reserved as operand 

Instructions 174a:a:l and 1743cjc2 in process, 
unit busy for (VL) + 4 CPs^ 

Instruction 174aa;0 in process, unit busy for 
(VL) + 9 CPs'" 

Instruction 070 in process, unit busy (070 issue 
time) + 7 CPs'' 



Vector instructions may or may not start execution immediately; they 
execute as data beccxnes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 174ijl - 174ij2 (continued) 
EJiECUTION TIME: Instruction issue, 1 CP 

Vg ready in (VL) + 3 CPs if data available^ 
Vi ready in (VL) + 10 CPs if data available*^ 
Unit ready, (VL) + 4 CPs if data available''" 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTION 175 



CAL Syntax 


Description 


Octal Code 


VM Vj,Z 


VM=1 when (Vj element) =0 


1750j0 


VM Vj, N 


VM=1 when (Vj element) ?^0 


1750jl 


VM VJ,P 


VM=1 when (Vj element) positive, 
(bit 2^3=0), includes (Vj element) =0 


1750j2 


VM VJ,M 


VM=1 when (Vj element) negative, 
(bit 2^3=1) 


1750J3 


Vi,VM Vj,Z 


VM=1 and (Vi compress element) =element 
index when (Vj element) =0 


175ij4 


VirVM VJ,N 


VM=1 and (Vi compress element) =element 
index when (Vj element) ?^0 


175ij5 


Vi,VM Vj',P 


VM=1 and (vi compress element) =element 
index when (Vj element) positive, 
(bit 2^3=0), includes (Vj element)=0 


nsije 


Vi,VM Vj,M 


VM=1 and (Vi compress element) =element 
index when (Vj element) negative, 
(bit 2^3=1) 


175ij7 



Vector mask and compress index instruction 175 is executed in the Full 
Vector Logical functional unit. 

Instruction llSOj'k, where k=0 through 3, creates a vector mask in VM 
based on the results of testing the contents of the elements of register 
Vj. Each bit of VM corresponds to an element of Vj. Bit 2°^ 
corresponds to element 0; bit 2^ corresponds to element 63. 

Instruction llSij'k, where k=4 through 7, creates an identical vector 
mask as in 1750,7'^ and in addition creates a compressed index list in 
register Vi based on the results of testing the contents of the 
elements of register Vj (see example) . 

The type of test made by the instruction depends on the low-order 2 bits 
of the k designator. The high-order bit of the k designator is used to 
select the compress index option. 

If the k designator is 0, the VM bit is set to 1 when (Vj element) is 
and is set to when (Vj element) is nonzero. 
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INSTRUCTION 175 (continued) 

If the k designator is 1, the VM bit is set to 1 when (Vj element) is 
nonzero and is set to when (Vj element) is 0. 

If the k designator is 2, the VM bit is set to 1 when (Vj element) is 
positive and is set to when (Vj element) is negative. A zero value 
is considered positive. 

If the k designator is 3, the VM bit is set to 1 when (Vj element) is 
negative and is set to when (Vj element) is positive. A zero value 
is considered positive. 

If the k designator is 4, the VM bit is set to 1 and register (Vi 
congress element) is set to Vj element index when (Vj element) is 0. 
Register Vi elements are written to and Vi element pointer advanced 
only when (Vj element) is 0. 

If the k designator is 5, the VM bit is set to 1 and register (Vt 
compress element) is set to Vj element index when (Vj element) is 
nonzero. Register Vi elements are written to and Vi element pointer 
advanced only when (Vj element) is nonzero. 

If the k designator is 6, the VM bit is set to 1 and register (Vi 
compress element) is set to Vj element index when (Vj element) is 
positive. Register Vi elements are written to and Vt element pointer 
advanced only when (Vj element) is positive. A zero value is 
considered positive. 

If the k designator is 7, the VM bit is set to 1 and register (Vi 
ccmipress element) is set to Vj element index when (Vj element) is 
negative. Register Vi elements are written to and Vi element pointer 
advanced only when (Vj element) is negative. 

The number of elements tested is determined by the contents of the VL 
register. VM bits corresponding to untested elements of Vj are zeroed. 

Vector mask instruction llBjk, fe=0 through 3, and ccxnpress index 
instruction 175ijfc, fe=4 through 7, provide a vector counterpart to 
the scalar conditional branch instructions. 



HOLD ISSUE CX)NDITIONS: Vj reserved as operand 

Instruction 14a; in process, unit busy 
(VL) + 4 CPs 

Instruction 175 in process, unit busy 
(VL) + 4 CPs 

For instruction 175 (k=4 through 7) , if 
register Vi reserved as operand or result. 
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EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTION 175 (continued) 

Instruction issue, 1 CP 

Vj ready, (VL) + 3 CPs if data available 

For instruction 175 (fe=4 through 7) , vi ready 
in (VL) + 10 CPs if data is available. 

Except for instruction 073, VM ready (VL) + 4 CPs 
if data available 

For instruction 073, VM ready (VL) + 5 CPs if 
data available 

fe=0 or 4, VM bit xx=l if (Vj element xx)=0. 

k=l or 5, VM bit xa^l if (Vj element xx)jiO. 

kr'l or 6, VM bit xx=l if (Vj element xx) is 
positive; is a positive condition. 

k=3 or 7, VM bit xsc^l if (Vj element xx) is 
negative. 

fe=4, (Vi compress element) =^a: if (Vj element 
XX) =0. 

fe=5, (Vi compress element) =xa: if (Vj element 
XX) /O, 

k=6, (Vi compress element) =xx if (Vj element 
XX) is positive; is a positive condition. 

k=7, (Vi compress element) =xx if (Vj element 
XX) is negative. 

For instruction 175 (fe=4 through 7) , if no test 
conditions are true, then (VM)=0 and no writes to 
register Vi occur and the elements of Vi will be 
unchanged by this instruction. 
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INSTRUCTION 175 (continued) 



Example : 



This example of the conpress index instruction YJSijA generates the same 
vector mask as instruction 1750j0 and also generates data into vector 
register Vi as follows: 

Vector length=133 



Vector 
element 



Register 
vi data 



Vector 
element 



Register 
Vj data 



00 
01 


00 
02 




00 
01 


Zero 
Nonzero 




02 


05 




02 


Zero 


03 


06 




03 


Nonzero 


04 


12 




04 


Nonzero 


05 


Unchanged 




05 


Zero 


06 


Unchanged 




06 


Zero 


• 


• 




07 


Nonzero 


• 


• 




10 


Nonzero 


• 


■ 




11 


Nonzero 


• 


• 




12 


Zero 
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INSTRUCTIONS 176 - 177 



CAL Syntax Description Octal Code 



Vi ,AO,fJi Transmit (VL) words from memory to Vi 176i0k 

elements starting at memory address (AO) and 
incrementing by (Afe) for successive 
addresses 

Vt ,A0,1 Transmit (VL) words from memory to Vi 176i00 

elements starting at memory address (AO) and 
incrementing by 1 for successive addresses 

vi ,AO,V/c Transmit (VL) words from memory to Vi llSilk 

elements using memory address (AO) + 
(V/c elements) 

,AO,hk VQ Transmit (VL) words from Vj elements to 1770jfe 
memory starting at memory address (AO) and 
incrementing by (A/c) for successive 
addresses 

,A0,1 Vj Transmit (VL) words from Vj elements to 1770j0 
memory starting at memory address (AO) and 
incrementing by 1 for successive addresses 

,A0,V?^ Vj Transmit (VL) words from Vj elements to 1771j7c 
memory using memory address (AO) + 
(Vfe elements) 



Instructions 176 and 177 transfer blocks of data between V registers and 
memory. 

Instruction 176 transfers data from memory to elements of register Vi. 

Instruction 177 transfers data from elements of register Vj to memory. 

For instructions 176i0fe and 1770 Jk, register elements begin with 
and are incremented by 1 for each transfer. Memory addresses begin with 
(AO) and are incremented by the contents of Afe. Afe contains a signed 
24-bit integer which is added to the address of the current word to 
obtain the address of the next word. A^ can specify either a positive 
or negative increment allowing both forward and backward streams of 
reference. 

The number of words transferred is determined by the contents of the VL 
register. 
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INSTRUCTIONS 176 - 177 (continued) 

For instructions 176ilfe and 1771jfe, register elements begin with 
and are incremented by 1 for each transfer. The low-order 24 bits of 
each element of Vk contains a signed 24-bit integer which is added to 
(AO) to obtain the current memory address. 

The number of words transferred is determined by the contents of the VL 
register. 

HOLD ISSUE CONDITIONS: For instruction 176 if Ports A and B busy 

For instruction 177 if Port C busy 

For instructions 176tlfe and 1171 jk, if 
neilk or mijk in progress 

AO reserved 

For instructions 176i0fe and lllOjk, if Afe 
reserved where fe=l through 7 

Scalar reference in CPl, CP2, or CP3 

For instruction 176, V register i reserved as 
operand or result 

For instruction 177, V register j reserved as 
operand 

For instruction 176ilfe and mijk, V register 
k reserved as operand 

If not bidirectional memory mode, then 
instruction 176 holds on Port C busy and 
instruction 177 holds on Port A or B busy. 



EXECUTION TIME: 



For instruction 176-tOk: 
Instruction issue, 1 CP 

Vt ready, (VL) + 17 CPs if memory is available 
Port A or B busy, (VL) + 6 CPs 

For instruction 1770 jk: 
Instruction issue, 1 CP 

Vj ready, (VL) +3 CPs if data is available 
Port C busy, (VL) + 7 CPs 



For instruction 176ilk: 
Instruction issue, 1 CP 
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INSTRUCTIONS 176 - 177 (continued) 

Vi ready, (VL) + 21 CPs if memory is available 
Vk reaAYi (VL) + 3 CPs if data is available 
Port A or B busy, (VL) + 10 CPs 
neilk busy, (VL) + 10 CPs 

For instruction 1771jfe: 
Instruction issue, 1 CP 
Vi and Vfe ready, (VL) + 3 CPs if data is 
available 

Port C busy, (VL) + 10 CPs 
lllljk busy, (VL) + 10 CPs 



SPECIAL CASES: 



For instructions 176i0/c and mOok, 
increment (A0)=1 if fe=0. 



Instruction 176 uses Port B. If Port B is busy 
at issue time, instruction 176 uses Fort A. 
Instruction 177 uses Port C. 

For instructions 176t0^ and mojkt 
(Afe) determines the memory increment. 
Successive addresses are located in successive 
banks. References to the same bank can be made 
every 4 CPs or more. Incrementing (Afe) by 64 
places successive memory references in the same 
bank, so a word is transferred every 4 CPs or 
more. If the address is incremented by 32, every 
other reference is to the same bank, and words 
can transfer no faster than one every 2 CPs. 
With any address incrementing that allows 4 CPs 
before addressing the same bank, the words can 
transfer each CP. 

Memory conflict can slow loading or storing of 
individual vector elements. The elements are 
loaded or stored in order, so any delay for any 
element delays all succeeding elements. 

For instruction 176, if there is an instruction 
using its destination register as a source, the 
execution of that instruction is delayed whenever 
there is a delay in instruction 176 results. 
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APPENDIX SECTION 



INSTRUCTION SUMMARY 
FOR CRAY X-MP MODEL 48 



CBAY X-MP 


CAL 


000000 


ERR 


ttooiojk 


CA,Aj kk 


ttOOlljk 


CL,Aj Ak 


ttooizjo 


CI,AJ 



UNIT 



ttOOllJl 



MC,AJ 



tt 0013 JO 


XA Aj 


tt0014jO 


RT Sj 


ttOOliJl 


IPrjl 


ff001402 


IP 


ff001403 


CLN 


f^001413 


CLN 1 


ff001423 


CLN 2 


fA001433 


CLN 3 


ff001443 


CLN 4 


^f001453 


CLN 5 


fl^0014j4 


PCI SJ 


ff001405 


CCI 


ff001406 


ECI 


y-f001407 


DCI 


ttOOlBjO 


ttt 


ffOOlSOl 


ttt 


/•fOOlSll 


ttt 


/•^001521 


ttt 


^/•001531 


ttt 


00200;;: 


VL Ak 


f002000 


VL 1 


002100 


EFI 



DESCRIPTION 

Error exit 

Set the channel (Aj) current 

address to (Ak) and begin the I/O 

sequence 

Set the channel (Aj) limit address 

to (Ak) 

Clear Channel (Aj) Interrupt flag; 

clear device master-clear (output 

channel) . 

Clear Channel (Aj) Interrupt flag; 

set device master-clear (output 

channel) ; clear device ready-held 

(input channel) . 

Enter XA register with (Aj) 

Enter RTC register with (Sj) 

Set interprocessor interrupt 

Clear interprocessor interrupt 

Enter CLN register with 

Enter CLN register with 1 

Enter CLN register with 2 

Enter CLN register with 3 

Enter CLN register with 4 

Enter CLN register with 5 

Enter II register with (Sj) 

Clear PCI request 

Enable PCI request 

Disable PCI request 

Select performance monitor 

Set maintenance read mode 

Load diagnostic checkbyte with SI 

Set maintenance write mode 1 

Set maintenance write mode 2 

Transmit (Ak) to VL register 

Transmit 1 to VL register 

Enable interrupt on floating-point 

error 



t Special syntax form 

tt Privileged to monitor mode 

ttt Not supported at this time 
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CRAY X-MP 


CAL 


002200 


DFI 


002300 


ERI 


002400 


DRI 


002500 


DBH 



UNIT 



002600 



EBM 



002700 


CMR 


- 


0030j0 


VM Sj 


- 


A003000 


VM 


- 


0034jk 


sajk i,TS 


- 


0036jfe 


SHjk 


- 


0037jfe 


SMjfe 1 


- 


004000 


EX 


- 


OOSOjfe 


J Bjk 


- 


ooeijkm 


J exp 


- 


OOlijkm 


R exp 


- 


OlOiJkm 


JAZ exp 


- 


Ollijkm 


JAN exp 


- 


012ijkm 


JAP exp 


- 


013ijkm 


JAM exp 


— 


OlHJkm 


JSZ exp 


- 


OlSijkm 


JSN ea:p 


- 


Oieijkm 


JSP exp 


— 


on ijkm 


JSM exp 


- 


Olhijkm 


hh exp 


- 


020ijkm 


Ki exp 


- 


02lijkm 


Ai exp 


- 


022ijk 


A-t exp 


— 


023ij0 


hi Sj 


- 


023t01 


Ai VL 


- 


024-tjfe 


A-i Bjk 


- 


025ijk 


Bjk hi 


- 


026tj0 


hi PSj 


Pop/LZ 


026ijl 


hi QSd 


Pop/LZ 


026ij7 


hi SBJ 


— 


027ij0 


At ZSJ 


Pop/LZ 


027-ij7 


SBj A-t 


- 


030ijk 


Ai AJ+A/C 


A Int Add 


t030i0k 


hi hk 


A Int Add 


t030ij0 


hi Aj+1 


A Int Add 



DESCRIPTION 

Disable interrupt on floating-point 

error 

Enable operand range interrupts 

Disable operand range interrupts 

Disable bidirectional memory 

transfers 

Enable bidirectional memory 

transfers 

Complete memory references 

Transmit (Sj) to VM register 

Clear VM register 

Test & set semaphore jk in SM 

Clear semaphore jk in SM 

Set semaphore jk in SM 

Normal exit 

Jump to (Bjfe) 

Jump to exp 

Return jump to expj set BOO to P. 

Branch to exp if (A0)=0 

(AO) ^0 

(AO) positive; 



(AO) negative 

(S0)=0 

(S0)?^0 

(SO) positive; 



Branch to exp if 

Branch to exp if 

is positive. 

Branch to exp if 

Branch to exp if 

Branch to exp if 

Branch to exp if 

is positive. 

Branch to exp if (SO) negative 

Transmit exp^ijkm to hh 

Transmit exprjkm to Ai 

Transmit ea^ones complement of 

jkm to hi 

Transmit exp=Jk to Ai 

Transmit (Sj) to Ai 

Transmit (VL) to Ai 

Transmit (Bjfe) to Ai 

Transmit (Ai) to Bjk 

Population count of (Sj) to Ai 

Population count parity of (Sj) 

to Ai 

Transmit (SBj) to Ai 

Leading zero count of (Sj) to Ai 

Transmit (Ai) to SBj 

Integer sum of (Aj) and (hk) to 

Ai 

Transmit (hk) to Ai 

Integer sum of (Aj) and 1 to Ai 



t Special syntax form 
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CRAY X-MP 
OSlv'fe 

tOSliOO 
t03li0k 

t03lij0 

032ijk 

033-tOO 
033ij0 

033ijl 

OSH^'k 
t03Hjk 

035ijk 
t035ijk 

Q36ijk 
f036ij/c 

037ijk 

t03H;pC 

040ij"km 
OAlijkm 

OAlijk 
t042ijk 



CAL 

Ai Aj-hk 

Ai -1 

Ai -aA: 

Ai Aj-l 

Ai AJ*Ak 

Ai CI 
Ai CA,AJ 

Ai CE,Aj 

Bjk,Ai ,A0 

BjktAi 0,A0 

,A0 BJkiAi 

0,A0 Bjk,Ai 

TjkfAi ,A0 

TjkrAi 0,A0 

,A0 Tjfe,Ai 

0,A0 TJkfAi 

si exp 
si exp 

Si <exp 
Si it>exp 



f042i77 Si 1 
f042i00 Si -1 
043ijfe Si >exp 



t043ijk si *<exp 



UNIT 

A Int Add 

A Int Add 
A Int Add 

A Int Add 

A Int Mult 



r-043i00 



St 



Memory 
Memory 
Memory 
Memory 
Memory 
Memory 
Memory 
Memory 

S Logical 
S Logical 



S Logical 
S Logical 
S Logical 



S Logical 
S Logical 



DESCRIPTION 

Integer difference of (Aj) less 
(Afe) to Ai 
Transmit -1 to Ai 
Transmit the negative 
of (Ak) to Ai 

Integer difference of (Aj) less 
1 to Ai 

Integer product of (Aj) and 
{Ak) to Ai 

Channel number to Ai ( j=0) 
Address of channel (Aj) to Ai 
(^0; fc=0) 

Error flag of channel (Aj) to Ai 
(#0; k=l) 

Read (Ai) words to B register 
j'k from (AO) 

Read (Ai) words to B register 
jk from (AO) 

Store (Ai) words at B register 
jk to (AO) 

Store (Ai) words at B register 
jk to (AO) 

Read (Ai) words to T register 
jk from (AO) 

Read (Ai) words to T register 
jk from (AO) 

Store (Ai) words at T register 
Jk to (AO) 

Store (Ai) words at T register 
jk to (AO) 
Transmit jkm to Si 
Transmit ea:p=ones complement of 
jkm to si 

Form ones mask exp bits in Si 
from the right; jk field gets 
6A-exp. 

Form zeros mask exp bits in Si 
from the left; jk field gets 
6A-exp. 

Enter 1 into Si 
Enter -1 into Si 
Form ones mask exp bits in Si 
from the left; jk field gets 
exp. 

Form zeros mask exp bits in Si 
from the right; jk field gets 
64-exp. 
Clear Si 



t Special syntax form 
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CRAY X-MP 


CAL 


Ul 


NIT 


044ij7c 


Si 


sj&sk 


s 


Logical 


t044ij0 
to AH JO 


si 
si 


Sj&SB 
SB&Sj 


s 
s 


Logical 
Logical 


045ijk 


si 


*sk&sj 


s 


Logical 


t045iJO 
0A6iJk 


Si 

si 


*SB&Sj 
S3\Sk 


s 
s 


Logical 
Logical 


t046iJO 


si 


Sj\SB 


s 


Logical 


t046iJO 


Si 


SB\SJ 


s 


Logical 


47 ijk 


si 


#sj\s/c 


s 


Logical 


tOAliok 


si 


#sfe 


s 


Logical 


t047ij0 


si 


#SJ\SB 


s 


Logical 


t047iJO 


si 


#SB\SJ 


s 


Logical 


f047i00 


si 


*SB 


s 


Logical 


050ijk 


si 


sjlsi&sfe 


s 


Logical 


tObOijO 


si 


SjlSi&SB 


s 


Logical 


osHJk 


si 


SjlSfe 


s 


Logical 


tosiiok 

t051iJO 


si 
si 


s;^ 

SjlSB 


s 
s 


Logical 
Logical 


t051iJO 


si 


SB'SJ 


s 


Logical 


fOSliOO 
052ijk 


si 
so 


SB 

si<ea^ 


s 
s 


Logical 
Shift 


053ijk 


so 


Si>exp 


s 


Shift 


054ijk 


si 


Si<exp 


s 


Shift 


055ijk 


si 


Si>exp 


s 


Shift 


056ijk 


si 


si,Sj<Ak 


s 


Shift 



DESCRIPTION 

Logical product of (Sj) and 
(Sfe) to Si 

Sign bit of (Sj) to Si 
Sign bit of (Sj) to Si 
U¥0) 

Logical product of (Sj) and 
ones complement of (Sk) to Si 
(Sj) with sign bit cleared to si 
Logical difference of (Sj) and 
(Sk) to Si 

Toggle sign bit of Sj, then 
enter into Si 

Toggle sign bit of Sj, then 
enter into Si iJf^O) 
Logical equivalence of (Sfe) and 
(Sj) to Si 

Transmit ones complement of (Sfe) 
to si 

Logical equivalence of (Sj) and 
sign bit to Si 

Logical equivalence of (Sj) and 
sign bit to Si (jyo) 
Enter ones complement of sign 
bit into Si 

Logical product of (Si) and (Sk) 
complement ORed with logical 
product of (Sj) and (Sk) to Si 
Scalar merge of (Si) and sign 
bit of (Sj) to Si 
Logical sum of (Sj) and (Sk) to 
Si 

Transmit (Sk) to si 
Logical sum of (Sj) and sign 
bit to si 

Logical sum of (SJ) and sign 
bit to Si (j/0) 
Enter sign bit into Si 
Shift (Si) left exp=Jk 
places to SO 
Shift (Si) right 
exp=64-Jk places to SO 
Shift (Si) left exp=Jk 
places 

Shift (Si) right 
exp=&4-Jk places 
Shift (Si and Sj) left (Afe) 
places to si 



t Special syntax form 
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CRAY X-MP 


CAI 
Si 


si,sj<i 


UNIT 


t056ij0 


S Shift 


t056i0k 


Si 


Si<Afc 


S Shift 


057ijk 


Si 


sd,si>Ak 


S Shift 


tOSlijO 


si 


sj',si>l 


S Shift 


t057i0k 


si 


si>Ak 


S Shift 


oeoijfe 


si 


sj+sk 


S Int Add 


oeiijk 


si 


sj-sk 


S Int Add 


toeiiok 


si 


-sk 


S Int Add 


062ijk 


si 


Sj+FSk 


Fp Add 


t062i0k 


si 


+FSk 


Fp Add 


063idk 


si 


Sj-FSk 


Fp Add 


t063i0k 


si 


-FSk 


Fp Add 


064-tjfc 


si 


Sj*FSk 


Fp Mult 


065ijk 


si 


Sj*HSfe 


Fp Mult 


066ijk 


si 


Sj*RSfe 


Fp Mult 


067ijk 


si 


Sj*ISfe 


Fp Mult 


OlOiJO 


si 


/HSJ 


Fp Rcpl 


OlliOk 


si 


Afe 


- 


Ollilk 


si 


+hk 


- 


07li2k 


si 


+FAk 


- 


07li30 


si 


0.6 


— 


07li40 


si 


0.4 


- 


07li50 


si 


1. 


- 


071160 


si 


2. 





DESCRIPTION 

Shift (Si and Sj) left one 
place to si 

Shift (Si) left (Ak) places 
to si 

Shift (S3 and Si) right (Afe) 
places to si 

Shift (Sj and Si) right one 
place to Si 

Shift (Si) right (Ak) places 
to Si 

Integer sum of (Sj) and (Sk) 
to si 

Integer difference of (Sj) and 
(Sk) to Si 

Transmit negative of (Sk) 
to Si 

Floating-point sum of (Sj) and 
(Sk) to Si 

Normalize (S^) to Si 
Floating-point difference 
of (Sj) and (Sk) to si 
Transmit normalized negative 
of (Sk) to si 

Floating-point product of (Sj) 
and (Sk) to Si 
Half-precision rounded 
floating-point product of (Sj) 
and (Sk) to Si 
Full-precision rounded 
floating-point product of (Sj) 
and (S^) to Si 

2-floating-point product of (Sj) 
and (Sk) to Si 
Floating-point reciprocal 
approximation of (Sj) to Si 
Transmit (Ak) to Si. with no 
sign extension 

Transmit (Ak) to Si with sign 
extension 

Transmit (Ak) to Si as 
unnormalized floating-point number 
Transmit constant 0.75*2**48 to Si 
Transmit constant 0.5 to Si 
Transmit constant 1.0 to Si 
Transmit constant 2.0 to Si 



t Special syntax form 
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CRAY X-MP CAL 



UNIT 



DESCRIPTION 



07li70 
072-tOO 
072i02 
072ij3 
073i00 
073ill 
073121 

073131 
073tjl 
073i02 
073ij3 
OlHjk 
015ijk 
076ijk 

onijk 
toniok 
lOhijkm 

tiooijkm 
tlOOijkm 
tlOhiOO 

llhijkm 
tllOijhn 
tiioijkm 
tliHoo 

12hijkm 

tl20ijT<m 
tl20ijkm 
tl2hi00 

IShijkm 
tl30ijlm 
tl30ijkm 
tl3Hoo 

140ijk 

14Hj?C 

142ijfe 

fl42i0fe 
143ijfe 



si 4. 

si RT 

si SM 

si ST J 

si YM 

tt 
tt 

tf 

si SRj 

SM si 
STj si 
si Tjfe 
rjk si 

si VJrfJc 

Vi,hk S3 
vi,Kk 
Ai exp,Ah 

hi exp, 
Ai exp, 
hi ,Ah 
exp. Ah Ai 
exp, Ai 
exp, Ai 
,Ah hi 
Si exp,Ah 

si exp, 
si exp. 
Si ,Ah 
exp. Ah si 
exp,0 si 
exp, si 
,Ah si 
vi Sj&vfe 

vi vj&vfe 

vi Sjivfe 

vi vfe 
vi vjlvfe 



Transmit constant 4.0 to Si 
Transmit (RTC) to St 
Transmit (SM) to Si 
Transmit (STj) to Si 
Transmit (VM) to Si 
Read counter into Si 
Increment performance counter 
(maintenance) 

Clear all maintenance modes 
Transmit (SRj) to Si (j=0) 

(Si) to SM 

(Si) to ST J 

(Tjfe) to Si 

(Si) to Tjk 

(Vj, element 



(Afe)) 



Transmit 

- Transmit 

- Transmit 
Transmit 

- Transmit 
to Si 

- Transmit (Sj) to Vi element (Afe) 
Clear Vi element (Afe) 

Memory Read from ( {Ah) +exp) to Ai 

(A0=0) 
Memory Read from (exp) to Ai 
Memory Read from (exp) to Ai 
Memory Read from (Ah) to Ai 
Memory Store (Ai) to (Ah)+exp (A0=0) 
Memory Store (Ai) to exp 
Memory Store (Ai) to exp 
Memory Store (Ai) to (A^) 
Memory Read from ((Ah)+exp) to Si 

(A0=0) 
Memory Read from exp to Si 
Memory Read from exp to Si 
Memory Read from (A^) to Si 
Memory Store (Si) to (Ah)+exp (A0=0) 
Memory Store (Si) to exp 
Memory Store (Si) to exp 
Memory Store (Si) to (Ah) 

V Logical Logical products of (Sj) and 

(Vfe) to Vi 

V Logical Logical products of (Vj) and 

(Vfe) to Vi 

V Logical Logical sums of (Sj) and (Vfe) 

to Vi 

V Logical Transmit (Vk) to vi 

V Logical Logical sums of (Vj) and (Vfe) 

to vi 



t Special syntax form 

tt Not supported at this time 
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CRAY X-MP 


CAL 


UNIT 


144ijfe 


vi 


sj\ik 


V Logical 


lASiJk 


Vi 


vj\yk 


V Logical 


tlASiii 
neijk 


Vi 

vi 



SjlVk&VH 


V Logical 

V Logical 


tl46i0k 


vi 


ivm&vk 


V Logical 


147ijfe 


vi 


VjlVk&VM. 


V Logical 


150ijk 


vi 


Vj<Rk 


V Shift 


fisoijo 

151ijk 


vi 
vi 


VJ<1 

vj>Ak 


V Shift 

V Shift 


152ijk 


vi 
vi 


V3>1 
Vj,Vj<Ak 


V Shift 

V Shift 


tl52ij0 


vi 


Vd,V3<l 


V Shift 


153ijk 


vi 


Vj,V3>Ak 


V Shift 


tissijo 


vi 


vj,vo>i 


V Shift 


l5Hjk 


vi 


sj+vk 


V Int Add 


ISBiJk 


vi 


vj+vk 


V Int Add 


156ijk 


vi 


sj-vk 


V Int Add 


tiseiok 


vi 


-vk 


V Int Add 


15H3k 


vi 


vj-vk 


V Int Add 


leoijk 


vi 


Sj*FVk 


Fp Mult 


leiij'k 


vi 


Vj*FVk 


Fp Mult 


162ijk 


vi 


Sj*HVfe 


Fp Mult 


163ijk 


vi 


Vj*HVfe 


Fp Mult 


I64ijk 


vi 


Sj*RVk 


Fp Mult 



DESCRIPTION 

Logical differences of (Sj) and 
(Vk) to Vi 

Logical differences of (Vj) and 
(v;^) to Vi 

Clear vi 

Transmit (Sj) if VM bit=l; 
(Vk) if VM bit=0 to Vi. 

Vector merge of (Vk) and 

to Vi 

Transmit (Vj) if VM bit=l; 

(Vk) if VM bit=0 to Vi. 

Shift (Vj) left (Afe) places 

to vi 

Shift (Vj) left one place to vi 

Shift (Vj) right (hk) places 

to vi 

Shift (Vj) right one place to Vi 

Double shift (Vj) left (Afe) 

places to vi 

Double shift (Vj) left one place 

to vi 

Double shift (Vj) right (Afe) 

places to vi 

Double Shift (Vj) right one 

place to vi 

Integer sums of (SJ) and (Vfe) 

to vi 

Integer sums of (Vj) and {Vk) 

to Vi 

Integer differences of (Sj) and 

(Vk) to Vi 

Transmit negative of (Vk) 

to Vi 

Integer differences of (Vj) and 

(Vk) to Vi 

Floating-point products of 

and (Vk) to Vi 

Floating-point products of 

and (Vk) to Vi 

Half-precision rounded 

floating-point products of 

and (Vk) to vi 

Half -precision rounded 

floating-point products of 

and (Vk) to vi 

Rounded floating-point products 

of (Sj) and (Vk) to Vi 



(Sj) 
(Vj) 

(SJ) 

(Vj) 



t Special syntax form 
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CRAY X-MP 

leH^'k 

noijk 

tnoiok 
niijk 

\i2ijk 

fii2iok 

1731 jk 

174ij0 

174ijl 
174-tj2 

1750J0 
1750^1 
1750J2 

1750j3 
1750J4 

1750J5 

1750J6 

1750j7 

neiok 

tneioo 

neiik 

1770jfe 
^1770j0 

mijk 



CAL 

Wi Vj*RVfe 

Vi Sj*IVfc 

vi vj*ivfe 

vi Sj+FVfe 

vi +FV/C 
vi V^FVfe 

Vi Sj-FVfe 

vi -FVfe 

Vi Vj-FVk 

vi /HVJ 

vi pvj 
vi QVj 

VM VJ,Z 
VM Vj,N 
VM VJ , P 

VM VJ,M 
Vi,VM Vj,Z 

Vi,VM VJ,N 

Vi,VM VJ,P 

Vi,VM VJ,M 

vi ,AO,Ak 

vi ,A0,1 

Vi ,AO,Vfe 

,AO,Afe VJ 

,A0,1 VJ 

,AO,Vfe Vj 



UNIT 

Fp Mult 

Fp Mult 

Fp Mult 

Fp Add 

Fp Add 
Fp Add 

Fp Add 

Fp Add 

Fp Add 

Fp Rcpl 

V Pop 

V Pop 

V Logical 

V Logical 

V Logical 

V Logical 

V Logical 

V Logical 

V Logical 

V Logical 
Memory 
Memory 
Memory 
Memory 
Memory 
Memory 



DESCRIPTION 

Rounded floating-point products 
of (Vj) and (Vfc) to Vi 
2-floating-point products of (Sj) 
and (Wk) to Vi 

2-floating-point products of (Vj) 
and (Vfc) to Vi 

Floating-point sums of (Sj) and 
(Vfe) to Vi 

Normalize (Vfc) to Vi 
Floating-point sums of (Vj) and 
(Vfe) to vi 

Floating-point differences of 
(Sj) and (Vfc) to Vi 
Transmit normalized 
negatives of (Vfe) to Vi 
Floating-point differences of 
(Vj) and (Vfe) to Vi 
Floating-point reciprocal 
approximations of (Vj) to vi 
Population counts of (Vj) to Vi 
Population count parities of (Vj) 
to Vi 

VM=1 where (Vj)=0 
VM=1 where (Vj)?^0 
VM=1 if (Vj) positive; is 
positive. 

VM=1 if (Vj) negative 
VM=1 and (Vi) =element index if 
(Vj)=0 
VM=1 and 
(Vj)/0 

VM=1 and (Vi)=element index if 
(Vj) positive 

VM=1 and (Vi)=element index if 
(Vj') negative 

Read (VL) words to vi from 
(AO) incremented by (Afe) 
Read (VL) words to Vi from (AO) 
incremented by 1 
Read (VL) words to Vi, using 
(AO) + (Vfe) 

Store (VL) words from Vj* to (AG) 
incremented by (A?c) 
Store (VL) words from Vj' to (AG) 
incremented by 1 
Store (VL) words from Vj' using 
(AO) + {Vk) 



(Vi)=element index if 



t Special syntax form 
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6 MBYTE PER SECOND B 

CHANNEL DESCRIPTION 



INTRODUCTION 

Each input or output 6 Mbyte per second channel directly accesses Central 
Memory. Input channels store external data in memory and output channels 
read data from memory. A primary task of a channel is to convert 64-bit 
Central Memory words into 16-bit parcels or 16-bit parcels into 64-bit 
Central Memory words. Four parcels make up one Central Memory word with 
bits of the parcels assigned to memory bit positions (see section 2 of 
this publication) . 

Each input or output channel has a data channel (4 parity bits, 16 data 
bits, and 3 control lines), a 64-bit assembly or disassembly register, a 
channel Current Address (CA) register, and a channel Limit Address (CL) 
register. 

Three control signals (Ready, Resume, and Disconnect) coordinate the 
transfer of parcels over the channels. In addition to the three control 
signals, the output channel of the pair has a Master Clear line. 

This appendix describes the signal sequence of a 6 Mbyte per second input 
channel and an output channel. 



6 MBYTE EER SECOND INPUT CHANNEL SICBiAL SEQUENCE 

A general view of a 6 Mbyte per second input channel signal sequence is 
illustrated in table B-1. The data bits, parity bits, and each signal in 
the sequence are described below. 



DATA BITS 2^^ THROUGH 2^^ 

Data bits 2°, 2^, ..., 2^^ are signals carrying the 16-bit parcel 
of data from the external device to Central Memory. The data bits must 
all be valid within 25 nanoseconds after the leading edge of the Ready 
signal. Data bit signals must remain unchanged on the lines until the 
corresponding Resume signal is received by the external device. 
Normally, data is sent coincidentally with the Ready signal and is held 
until the subsequent Ready signal. 



HR-0097 B-1 



Table B-1. Input channel signal exchange 



Central Meniory 


Channel 


External Equipment 


1. 


Activate channel 
(set CL and CA) . 








2. 
3. 


t 
Resume 






Data 2^3 _ 248 ^^th Ready 










4. 
5. 
6. 


Resume 






Data 2^7 _ £32 ^ith Ready 








Data 231 - 2^6 y,ith Ready 


7. 
8. 


Resume 


— 


► 


Data 2l5 - 2^ with Ready 






9. 
10a. 


Write word to memory 
and advance 
current address. 

Resume 














10b. 


If (CA)=(CL), 
go to 13. 








11. 
12. 








If more data, go to 2. 

Disconnect (ignored if 
CA=CL or if channel 
















not active) . 


13. 


Set interrupt and 
deactivate channel. 









t step 2 can initially precede step 1; that is, the first parcel and 
ready signal can arrive before requested. 



PARITY BITS THROUGH 3 

Parity bits 0, 1, 2, and 3 are each assigned to a 4-bit group of data bits. 
The parity bits are set or cleared to give the bit group odd parity. Bit 
assignments follow. 
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Parity 


bit 


Data bits 




1 
2 
3 




20 - 23 
24- 27 

2^ - 2^^ 
2I2 _ 2I5 



Parity bits are sent from the external device to Central Memory at the 
same time as data bits and are held stable in the same way as the data 
bits. 



READY SIGNAL 

The Ready signal sent to Central Memory indicates a parcel of data is 
being sent to the Central Memory input channel and can be sampled. A 
Ready signal is a pulse 50 +10 nanoseconds wide (at 50% voltage points) . 
The leading edge of the Ready signal at Central Memory begins the timing 
for sampling the data bits. 



RESUME SIGNAL 

The Resume signal is sent from Central Memory to the external device 
showing the parcel was received and Central Memory is ready for the next 
data transmission. A Resume signal is a pulse 50 +8 nanoseconds wide (at 
50% voltage points) . 



DISCONNECT SIGNAL 

The Disconnect signal is sent from the external device to Central Memory 
and indicates transmission from the external device is complete. The 
Disconnect signal is sent after the Resume signal is received for the 
last Ready signal. A Disconnect signal is a pulse 50 +10 nanoseconds 
wide (at the 50% voltage points) . 



6 MBYTE PER SECOND OUTPUT CHANNEL SIGNAL SEQUENCE 

A general view of a 6 Mbyte per second output channel signal sequence is 
illustrated in table B-2. The data bits, parity bits, and each signal in 
the sequence are described following the table. 
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Table B-2. Output channel signal exchange 



Central Memory 


Channel 


External Equipment 


1. Activate channel 






(set CL and CA) . 






2. Read word from 






memory and advance 






current address. 






3. Data 2^3 _ 2^8 






with Ready 








4. 

5. Data 2^7 - 2^2 




Resume 




with Ready 
6. 




Resume 




7. Data 231 _ 2I6 
with Ready 








8. 

9. Data 2I5 _ 2° 
with Ready 

10. 




Resume 
Resume 








11. If (CA)?^(CL), 






go to 2. 
12. Disconnect. 










13. Set interrupt and 






deactivate channel. 







DATA BITS 2^^ THROUGH 2^^ 

Data bits 2^, 2-^, ,.,, 2-^^ are signals carrying a 16-bit parcel of 
data from Central Memory to an external device. The data bits are sent 
concurrently within 5 nanoseconds of the leading edge of the Ready 
signal. Data bit signals remain steady on the lines until the Resume 
signal is received. 
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PARITY BITS THROUGH 3 



Parity bits 0, 1, 2, and 3 are each assigned to a 4-bit group of data 
bits. The parity bits are set or cleared to give the bit group odd 
parity. Bit assignments follow: 



Parity bit 


Data bits 




1 
2 
3 


20 _ 23 
24 - 2? 
2^ - 2^1 
2^2 - 2^5 



Parity bits are sent fran Central Memory to the external device at the 
same time as the data bits and are held stable in the same way as the 
data bits. 



READY SIGNAL 

The Ready signal sent from Central Memory to the external device 
indicates data is present and can be sampled. A Ready signal is a pulse 
50+8 nanoseconds wide (at 50% voltage points) . The leading edge of the 
Ready signal can be used to time data san^ling in the external device. 



RESUME SIGNAL 

The Resume signal is sent from the external device to Central Memory 
showing the parcel was received and the external device is ready for the 
next parcel transmission. A Resume signal is a pulse 50 +10 nanoseconds 
wide (at 50% voltage points) . 



DISCONNECT SIGNAL 

The Disconnect signal is sent from Central Memory to the external device 
and indicates transmission from Central Memory is con^lete. The 
Disconnect signal is sent after Central Memory receives the Resume signal 
from the last Ready signal. A Disconnect signal is a pulse 50 +8 
nanoseconds wide (at 50% voltage points) . 
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PERFORMANCE MONITOR 



INTRODUCTION 

The system contains a set of eight performance counters to track certain 
hardware related events that can be used to indicate relative 
performance. The events that can be tracked are the number of specific 
instructions issued, hold issue conditions, the number of fetches, 
references, etc. and are selected through instruction OOlSjO. Table 
C-1 lists all operations that can be monitored. 

Performance monitoring instructions allow you to select specific hardware 
related events for monitoring, read the results of the performance 
monitors into a scalar register, and test the operation of the 
performance counters. 

The instructions used for performance monitoring are: 

OOlSjO Select performance monitor. 

073tll Read performance counter into Si. 

073121 Increment performance counter (maintenance) . 
All instructions are executed in monitor mode. 



SELECTING PERFORMANCE EVENTS 

Instruction OOlSjO selects for monitoring one of the four groups of 
hardware related events shown in table C-1 and clears all performance 
monitors. The low-order 2 bits of the j field selects the group. 

During each CP in non-monitor (user) mode, the performance counters 
advance their totals according to the number of monitored events that 
occur. Each of the performance counters can increment at a maximum rate 
of +3 per CP. This allows a counter to continuously monitor for 
approximately 62 hours before it is reset. 

Performance events are monitored only while operating in user 
(non-monitor) mode. Entering monitor mode disables advancing of the 
performance counters. 
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Table C-1. Performance counter group descriptions 




Monitor 


Performance 


Description 


Increment 


Function 


Counter 




Per CP 






Number of: 









Instructions issued 


+1 




1 


CPs holding issue 


+1 




2 


Fetches 


+1 


J=0 


3 


I/O references 


+1 




4 


CPU references 


+3 max 




5 


Floating-point add operations 


+1 




6 


Floating-point multiply operations 


+1 




7 


Floating-point reciprocal operations 


+1 






Hold issue conditions: 









Semaphores 


+1 




1 


Shared registers 


+1 




2 


A registers and functionals 


+1 


J=l 


3 


S registers and functionals 


+1 




4 


V registers 


+1 




5 


V functional units 


+1 




6 


Scalar memory 


+1 




7 


Block memory 


+1 






Number of: 









Fetches 


+1 




1 


Scalar references 


+1 




2 


Scalar conflicts 


+1 


J=2 


3 


I/O references 


+1 




4 


I/O conflicts 


+1 




5 


Block references 


+3 max 




6 


Block conflicts 


+3 max 




7 


Vector memory references 


+3 max 






Number of: 









000 - 017 instuctions 


+1 




1 


020 - 137 instructions 


+1 




2 


140 - 157, 175 instructions 


+1 


J=3 


3 


160 - 174 instructions 


+1 




4 


176, 177 instructions 


+1 




5 


Vector integer operations 


+3 max 




6 


Vector floating-point operations 


+3 max 




7 


Vector memory references 


+3 max 
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READING PERFORMANCE RESULTS 

Performance counter totals can be read using instruction 073ill, which 
transmits either the high-order or low-order bits of a performance 
counter to the high-order bits of scalar register Si according to the 
contents of the performance counter pointer. 

Entering monitor mode disables advancing of all performance counters and 
clears the performance counter pointer. The first execution of a 
073ill instruction reads the low-order bits of counter into Si and 
increments the performance counter pointer. The second 073ill 
instruction reads the high-order bits of counter into Si and again 
increments the pointer. After each 073ill instruction, the performance 
counter pointer is advanced by 1. Even values of the pointer select the 
low-order bits of a performance counter to be read into Si; odd values 
of the pointer select the high-order bits of the performance counter to 
be read. 

Low-order bits through 25 of the performance counter are read into bits 
32 through 57 of Si. High-order bits 26 through 45 of the performance 
counter are read into bits 38 through 57 of Si. 

A sequence for reading a set of performance counters appears as follows 
(there must be a 2 CP delay between sequential 073ill instructions) : 

073ill Low-order bits of counter to si 

2 CP delay 

073ill High-order bits of counter 1 to Si 

2 CP delay 

073ill Low-order bits of counter 1 to Si 

2 CP delay 

073ill High-order bits of counter 2 to Si 

2 CP delay 



TESTING PERFORMANCE COUNTERS 

Instruction 073i21 is used to test the operation of the performance 
counters by incrementing the value stored in the counter while in monitor 
mode. 

Entering monitor mode disables advancing of all performance counters by 
user programs and clears the performance counter pointer. This pointer 
determines which performance counter, and which bits in that counter, 
will be incremented. Even values of the pointer increment bits and 6 
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of the performance counter when instruction 073i21 is executed, odd 
values of the pointer increment bit 26. The pointer is advanced from 
even to odd and to the next counter through instruction 073ill. 

There must be a 1 CP delay between sequential 073^21 instructions. 

Execution of instruction 073i21 loads register Si with all ones as a 
side effect of the basic 073 instruction. 
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SECDED MAINTENANCE FUNCTIONS 



INTRODUCTION 

Modules involved with generating and interpreting the 8-bit check byte 
used for SECDED include logic that can be used for verifying check bit 
storage, check bit generation, and error detection and correction. 

The instructions used for these maintenance mode functions are; 

001501 Set maintenance read mode. 

001511 Load diagnostic check byte with SI. 

001521 Set maintenance write mode 1. 

001531 Set maintenance write mode 2. 

073t31 Clear all maintenance modes. 

These instructions are all executed in monitor mode, and for instructions 
0015XX, the maintenance mode switch (located on the mainframe's control 
panel) must be on or the instructions become no-ops. 



VERIFICATION OF CHECK BIT STORAGE 

To verify the storage ability of the SECDED check bits without moving 
memory modules, two instructions are used: 001501 and 001521. 

The maintenance write mode 1 instruction, 001521, replaces the 8 check 
bits generated by the SECDED circuitry with specific bits of a data word 
as it is written into memory. The maintenance read mode instruction, 
001501, complements the write instruction by replacing the same bits of a 
data word with the 8 check bits as it is read from memory. 

By using the instructions together (and with error correction disabled 
through the switch on the mainframe's control panel), specified bits of a 
data word are stored and read back through the check bit storage paths 
and verification of SECDED check bit storage operation is accomplished. 
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Instruction 001521, maintenance write mode 1, and 001501, maintenance 
read mode, replace data bits with check bits and vice versa as shown 
below. 

Data bit Check bit 

46 

47 1 

62 2 

63 Read »► 3 

14 -* ^Write 4 

15 5 

30 6 

31 7 



VERIFICATION OF CHECK BIT GENERATION 

The maintenance read mode instruction, 001501, is used to verify the 
correct generation of SECDED check bits for a word of data. 

When the instruction is executed, the 8 check bits for SECDED replace 
specific data bits as the word is read into memory (as shown above) . A 
test program can easily extract these check bits and verify their 
correctness, thus checking the accuracy of the SECDED check bit circuitry. 

Since the CPU replaces the data bits with check bits on all reads to 
memory until instruction 073i31 is executed (including fetch, scalar 
and vector reads, and I/O for the CPU) , the test program should initially 
rewrite all of memory using the 001501 instruction to set up the SECDED 
check bits for a subsequent read by fetch or I/O. 

Error correction must be disabled during this test. 



VERIFICATION OF ERROR DETECTION AND CORRECTION 

The maintenance write mode 2 instruction, 001531, and the load diagnostic 
check byte with SI instruction, 001511, are used to verify operation of 
the SECDED circuitry. 
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To verify operation, a diagnostic check byte is initially loaded with the 
upper-order bits of register SI through instruction 001511 as shown below: 





Diagnostic 


SI bit 


check bit 


56 





57 


1 


58 


2 


59 


3 


60 


4 


61 


5 


62 


6 


63 


7 



This diagnostic check byte is then written into memory in place of the 
normal SECDED check bits on any subsequent CPU write to memory (writes 
from I/O through this CPD are not affected) . With error correction 
enabeled (through the switch on the mainframe's control panel), a 
subsequent read of the memory location allows different paths within the 
error detection and correction circuitry to be checked out. 

The diagnostic check byte retains its value until a new one is entered. 



CLEARING MAINTENANCE MODE FUNCTIONS 

Instruction 073-t31, clear all maintenance modes, clears the following 
maintenance mode instructions: 

001501 Set maintenance read mode. 

001521 Set maintenance write mode 1. 

001531 Set maintenance write mode 2. 

A Master Clear also clears the instructions. 

As a side effect of the 073i31 instruction, Si is loaded with all 
ones. 
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INDEX 



INDEX 



l-parcel instruction format 

with combined j and k fields, 5-2 

with discrete 3 and k fields, 5-1 
100 Mbyte per second channel, 2-12 
1250 Mbyte per second channel, 2-12 
2-parcel instruction format 

with combined i, j, k, and m fields, 5- 

with combined J, k, and m fields, 5-2 
6 Mbyte pec second channel, 2-13 

data bits, B-1, B-4 

descriptions, B-1 

disconnect signal, B-3, B-5 

input channel error conditions, 2-17 

input channel programming, 2-17 

input channel signal exchange, B-2 

input signal sequence, B-1 

instructions, 2-15 

I/O interrupts, 2-16 

I/O program flowchart, 2-18 

multi-CPU programming, 2-15 

operation, 2-16 

output channel signal, B-4 

output progreumning, 2-18 

output signal sequence, B-3 

parity bits, B-2, B-5 

programmed master clear to external 
device, 2-19 

ready signal, B-3, B-5 

resume signal, B-3, B-5 

word assembly/disassembly, 2-17 
8-bit check byte, 2-7 
8-bit Status register, 4-8 



A registers, 4-3 

Access priorities. Central Memory, 2-6 

Access time, memory, 2-1 

Active Exchange Package, 3-13 

Addition algorithm, 4-27 

Addition, floating-point, 4-27 

Address Add functional unit, 4-14 

Address functional units, 4-14 

Address Multiply functional unit, 4-14 

Address processing, 4-1 

Address registers, 4-3 

Addressing, memory, 2-3 

Algorithm 

addition, 4-27 

derivation of division, 4-30 

division, 4-28 

multiplication, 27 
AND function, 4-35 



Arithmetic 

floating-point, 4-22 

integer, 4-21 

operations, 4-21 
Auxiliary I/O Processor (XIOP) , 1-10 



B registers, 4-5 

Bank busy conflict, 2-5 

Banks, 2-1 

Beginning Address register, 3-3 

Bidirectional Memory Mode (BDH) flag, 3-9 

Bidirectional memory references, 2-4 

BIOP, see Buffer I/O Processor 

Block reads and writes, concurrent, 2-4 

Block transfer references, 2-3 

Branching, forward and backward, 3-4 

Buffer I/O Processor (BIOP) , 1-10 

Buffers, instruction, 3-3 



CA register, see Current Address register 
Central Memory 

access ports, 2-3 

access priorities, 2-6 

error correction, 2-6 

access time, 2-1 

access, 2-3, 2-19 

addressing, 2-3 

banks, 2-1 

conflict resolution, 2-5 

cycle time, 2-1 

features, 2-1 

organization, 2-2 

ports, 2-3 

references per clock period, 2-2 

sections, 2-2 

size, 2-1 

transfer rates, 2-1 

types of conflict, 2-5 

word size, 2-1 
Central Processing Unit 

computation section, 4-1 

control and data paths, 1-7 

control sections, 3-1 

input/output section, 2-12 

instructions, 5-1 

shared resources, 2-1 

speed, 1-3 
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Channel (see also 6 Mbyte per second 
channel) 

100 Mbyte pec second, 2-12 

1250 Mbyte per second, 2-12 

6 Mbyte per second, 2-13 

features, 2-13 

groups, 2-19 

input/output data paths, 2-22 

I/O control, 2-21 

numbers, 2-20 

types, 1-3 
Channel Limit Address register (CL) , 2-16 
Characteristics of system, 1-3 
Check bits, 2-7 
CIP register, see Current Instruction 

Parcel register 
CL register, see Channel Limit register 
Clear programmable clock interrupt request, 

3-20 
Clearing maintenance mode functions, D-3 
CLN register, see Cluster Number register 
Clock 

programmable, 3-19 

real-time, 2-9 
Clock period, 1-4 

Cluster Number (CLN) register, 2-10, 3-12 
Communication, inter-CFO, 2-9 
Computation section, characteristics, 4-2 
Concurrent reads and writes, block, 2-4 
Condensing units, 1-13 
Configurations of system, 1-16 
Conflict resolution. Central Memory, 2-5 
Conflicts 

bank busy, 2-5 

memory bank, 2-20 

section access, 2-6 

shared register and semaphore, 2-12 

simultaneous bank, 2-6 
Control and data paths for a single CPU, 1-6 
Convent ions , 1-1 
Correctable Memory Error Mode (ICM) flag, 

3-10 
Counter, Interrupt Countdown (ICD), 3-20 
CP, see clock period 
CPU, see Central Processing Unit 
CSB (read address) , 3-9 
Current Address (CA) register, 2-16 
Current Instruction Parcel (CIP) register, 

3-2 
Cycle time, 2-1 



Data Base Address (DBA) register, 3-18 
Data format 

floating-point, 4-22 

integer, 4-21 
Data Limit Address (DLA) register, 3-18 
Deadlock (DL) flag, 3-11 
Deadstart sequence, 3-21 

Derivation of the division algorithm, 4-30 
DIOP, see Disk I/O Processor 
Disk controller unit (DCU) , 1-10 
Disk I/O Processor (DIOP) , 1-10 



Disk storage units, 1-10 

Division algorithm, 4-28 

Division algorithm, derivation of, 4-30 

Double-precision numbers, 4-26 



E (error type) , 3-8 

Enable second vector logical (ESVL) , 3-7 
Enhanced addressing mode (BAH) , 3-8 
Error correction, see also SECDED 

Central Memory, 2-6 

matrix, 2-8 
Error Exit (EEX) flag, 3-12 
Errors, floating-point range, 4-23 
Exchange 

initiating, 3-14 

mechanism, 3-5 
Exchange Address (XA) register, 3-5, 3-9 
Exchange Package, 3-5 

active, 3-13 

assignments, 3-7 

contents, 3-6 

enable Second Vector Logical, 3-7 

memory error data, 3-8 

management, 3-15 

processor number, 3-7 

registers, 3-9 

vector not used (VNU) , 3-7 
Exchange sequence, 3-13 
Exchange sequence issue conditions, 3-15 
Exclusive NOR function, 4-35 
Exclusive OR function, 4-35 
Execution interval, 3-15 
Exponent matrix for Floating-point Multiply 

unit, 4-24 
External Interrupts flag, 3-10 



F register, see Flag register 

Fetch 

following scalar store, 2-5 
request, 2-4 

Flag (F) register, 3-11 

Flags 

Bidirectional Memory Mode (BDM) , 3-9 
Correctable Memory Error Mode (ICM) , 

3-10 
Deadlock (DL) , 3-11 
Error Exit (EEX) , 3-12 
Exchange register flags, 3-11 
External Interrupts, 3-10 
Floating-point Error (FPE) , 3-11 
Floating-point Error Mode (IFP) , 3-10 
Floating-point Error Status (FPS) , 3-9 
I/O Interrupt (lOI) , 3-12 
Interrupt from Internal CPU (ICP), 3-11 
Interrupt Monitor Mode ( IMM) , 3-10 
MCU Interrupt (MCU) , 3-11 
Memory Error (ME) , 3-11 
Monitor Mode (MM) , 3-10 
Normal Exit (NEX) , 3-12 
Operand Range Error (ORE) , 3-11, 3-19 
Operand Range Error Mode (lOR) , 3-10 
Progrcun Range Error (PRE), 3-11, 3-18 
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Flags (continued) 

Progranunable Clock Interrupt (PCI), 3-11 

Selected for External Interrupts (SEI) , 
3-10 

Semaphore, 3-9 

Uncorrectable Memory Error Mode (ION), 
3-10 

Waiting for Semaphore (HS) , 3-9 
Floating-point 

Add functional unit, 4-19 

Add functional unit range error, 4-23 

addition, 4-27 

arithmetic, 4-22 

data format, 4-22 

Error (FPE) flag, 3-11 

Error Mode (IFF) flag, 3-10 

Error Status (FPS) flag, 3-9 

functional units, 4-19 

integer multiply, 4-27 

Multiply functional unit, 4-20, 4-24 

multiply partial-product sums pyramid, 
4-29 

normalized numbers, 4-23 

range errors, 4-23 

range overflow, 4-23 

Reciprocal Approximation functional 
unit, 4-26 

subtraction, 4-27 
Forward and backward branching, 3-4 
Full Vector Logical functional unit, 4-17 
Functional units, 4-13 

Address, 4-14 

floating-point, 4-19 

Floating-point Add, 4-19, 4-23 

Floating-point Multiply, 4-20, 4-24 

Floating-point Reciprocal 
Approximation, 4-26 

Full Vector Logical, 4-17 

Reciprocal Approximation, 4-20 

scalar, 4-15 

Scalar Add, 4-15 

Scalar Logical, 4-16 

Scalar Population/ParityAeading Zero, 
4-16 

Scalar Shift, 4-15 

Second Vector Logical, 4-18 

vector, 4-16 

Vector Add, 4-17 

Vector Logical, 4-17 

Vector Population/parity, 4-18 

vector reservation, 4-16 

Vector Shift, 4-17 



g field, 5-1 

General form for instructions, 5-1 

Group descriptions, performance counter, C-2 



t field, 5-1 

ISA register, see Instruction Base Address 

register 
ICD, see Interrupt Countdown counter 
II register, see Interrupt Interval register 
ILA register, see Instruction Limit Address 

register 
In-buffer condition, 3-4 
Inclusive OR function, 4-35 
Index generation, 4-17 
Input/output 

channel, references, 2-4 

channel types, 1-3 

data paths, 2-22 

Interrupt (lOI) flag, 3-12 

interrupt, 2-15 

lockout, 2-20 

memory addressing, 2-23 

memory conflicts, 2-23 

memory request conditions, 2-23 

priority, 2-6 

processors, types of, 1-8 

program flowchart, 2-1 B 

section, 2-12 

Subsystem, 1-8 

Subsystem, data transfer, 2-14 
Input signal sequence, 6 Mbyte per second 

channel, B-1 
I/O, see Input/output 
I/O Subsytem, data transfer, 2-14 
Instruction 

Base Address (IBA) register, 3-17 

buffers, 2-1, 3-3 

descriptions, 5-6 

fetches, 2-4 

issue, 5-5 

issue and control elements, 3-1 

issue to memory ports, 2-3 

Limit Address (ILA) register, 3-17 

summary for CRAY X-MP model 48, A-1 

parcel, 3-1 
Instruction formats 

1-parcel, 5-1, 5-2 

2-parcel, 5-2, 5-3 
Integer arithmetic, 4-21 
Integer data formats, 4-21 
Inter-CPU 

communication section, 2-9 

priority, 2-6 
Interfaces, 1-7 
Intermediate registers, 4-3 
Interrupt 

Countdown (ICD) Counter, 3-20 

from Internal CPU (ICP) flag, 3-11 

Interval (II) register, 3-19 

Monitor Mode (IHH) flag, 3-10 
Intra-CPO priority, 2-6 
Issue, 3-2 



h field, 5-1 

Half-precision multiply, 4-28 



g field, 5-1 
k field, 5-1 
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Limit Address (CL) register, 2-16 

Logical operations, 4-35 

Lower Instruction Parcel (LIP) register. 



3-3 



m field, 5-1 

Mask operation, 4-36 

Master Clear sequence, to external devices, 

2-19 
Master I/O Processor (MlOP) , 1-9 
HCU interrupt (HCD) flag, 3-11 
Memory, see also Central Memory 

addressing, I/O, 2-23 

bank conflicts, 2-20 

conflicts, I/O, 2-23 

Error (ME) flag, 3-11 

error correction, see SECDED 

error data fields, 3-8 

field protection, 3-16 

field registers, 3-13 

request conditions, I/O, 2-23 
HIOP, see Master I/O Processor 
Mode (H) register, 3-9 
Monitor Mode (MM) flag, 3-10 
Motor-generator units, 1-15 
Multi-CPU programming, 2-15 
Multiplication algorithm, 4-27 
Multiply, half-precision, 4-28 
Multiply pyramid, 4-29 



Newton's method, 4-30 

Next Instruction Parcel (NIP) register, 3-2 

Normal Exit (NEX) flag, 3-12 

Normalized floating-point numbers, 4-23 

Notation conventions, 1-1 

Numbers 

double-precision, 4-26 
normalized floating-point, 4-23 



Operand Range Error 

flag (ORE) , 3-11, 3-19 

Mode (lOR) flag, 3-10 
Operating registers, 4-3 
Organization 

system, 1-4 

memory, 2-2 
Out-of-buffer condition, 3- 



P register, see Program Address register 
Parallel vector operations, 4-11 
Parity error, 2-17 
Performance 

counter group descriptions, C-2 

monitor, 3-20, C-1 
Power distribution units, 1-14 
Priority 

inter-CPU, 2-6 

intra-CPO, 2-6 
Processor number, 3-7 

Program Address (P) register, 3-2, 3-13 
Progrcun Range Error (PRE) flag, 3-11, 3-18 



Program State (PS) register, 3-12 
Programmable clock 

instructions, 3-19 

Interrupt (PCI) flag, 3-11 
Programmed Master Clear to external device, 
2-19 



R ( read mode) , 3-8 

Reading performance results, C-3 

Real-time Clock (RTC) register 

instructions, 2-9 
Reciprocal Approximation functional unit, 

4-20 
References, memory, 2-3 
Registers 

8-bit Status, 4-8 

A, address, 4-3 

B, 4-5 

Beginning Address, 3-3 
channel Limit Address (CL) , 2-16 
Cluster Number (CLN) , 2-10, 3-12 
Current Address (CA) , 2-16 
Current Instruction Parcel (CIP) , 3-2 
Data Base Address (DBA) , 3-18 
Data Limit Address (DLA) , 3-18 
designators, 5-7 
Exchange Address (XA) , 3-5, 3-9 
Flag (P) , 3-11 

Instruction Base Address (IBA), 3-17 
Instruction Limit Address (ILA) , 3-17 
Intermediate, 4-3 
Interrupt Interval (II) , 3-19 
Limit Address (CL) , 2-16 
Lower Instruction Parcel (LIP) , 3-3 
memory field, 3-13 
Mode (M) , 3-9 

Next Instruction Parcel (NIP) , 3-2 
operating, 4-3 

Program Address (P), 3-2, 3-13 
Program State (PS) , 3-12 
Real-time Clock (RTC) 2-9 
S, Scalar, 4-6 
Semaphore, 2-11 
shared, 2-10 

Shared Address (SB) , 2-11 
Shared Scalar (ST) , 2-11 
status, 4-8 
T, 4-8 

V, vector, 4-9 
Vector Control, 4-13 
Vector Length (VL) , 4-13 
Vector Mask (VM) , 4-13 
Reservations and chaining, V register, 4-12 



S (syndrome) , 3-8 

S registers, 4-6 

Scalar 

Add functional unit, 4-15 
functional units, 4-15 
Logical functional unit, 4-16 
memory references, 2-4 
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Scalar (continued) 

Population/parity/leading zero 
functional unit, 4-16 

registers, 4-6 

Shi£t functional unit, 4-15 
SECDED, 2-6 

maintenance functions, D-1 

memory data path, 2-7 
Second Vector Logical functional unit, 4-18 
Section access conflict, 2-6 
Sections, 2-2 
Selected for External Interrupts (SEI) 

flag, 3-10 
Selecting performance events, C-1 
Semaphore registers, 2-11 
Shared 

register and semaphore conflicts, 2-12 

registers, 2-10 

resources, 2-1 
Shared Address (SB) registers, 2-11 
Shared Scalar (ST) registers, 2-11 
Simultaneous bank conflict, 2-6 
Solid-state Storage Device, 1-11 

data transfer, 2-14 

chassis, 1-12 
Special characters, 5-6 
Special register values, 5-4 
SSD, see Solid-state Storage Device 
Status register, 4-8 
Syndrome, 2-7 
System 

basic organization, 1-5 

block diagram with block multiplexer 
channels, 1-17 

block diagram with full disk capacity, 
1-16 

characteristics, 1-3 

components , 1-4 

configuration, 1-16 

description, 1-1 

physical characteristics, 1-3 



Vector (continued) 

Length (VL) register, 4-13 

Logical functional units, 4-17 

Mask (VM) register, 4-13 

not used (VNO) , 3-7 

Eopulation/parity functional unit, 4-18 

registers, 4-9 

Shift functional unit, 4-17 
Verification of 

check bit generation, 0-2 

check bit storage, D-1 

error detection and correction, D-2 
VHU - vector not used, 3-7 



Waiting for Semaphore (WS) flag, 3-9 
Word assembly/disassembly for 6 Mbyte per 

second channel, 2-17 
Word size, memory, 2-1 

XA register, see Exchange Address register 
XIOP, see Auxiliary I/O Processor 



T registers, 4-8 

Testing performance counters, C-3 

Time slot, 2-21 

Transfer rate 

instruction buffers, 2-1 

I/O section, 2-1 
Twos complement integer arithmetic, 4-21 



Uncorrectable Memory Error Mode (lOM) flag, 

3-10 
Unexpected Ready signal, 2-18 



V register reservations and chaining, 4- 

V registers, 4-9 
Vector 

Add functional unit, 4-17 
control registers, 4-13 
functional unit reservation, 4-16 
functional units, 4-16 
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